Research Article
An Extractive Multi-document Summarization System for Malayalam News Documents
@INPROCEEDINGS{10.4108/eai.27-2-2017.152340, author={Manju K and David Peter S and Sumam Mary idicula}, title={An Extractive Multi-document Summarization System for Malayalam News Documents}, proceedings={First EAI International Conference on Computer Science and Engineering}, publisher={EAI}, proceedings_a={COMPSE}, year={2017}, month={3}, keywords={Multidocument Summarization Malayalam Language Sentence Scoring Extractive Heuristic measures Word Net}, doi={10.4108/eai.27-2-2017.152340} }
- Manju K
David Peter S
Sumam Mary idicula
Year: 2017
An Extractive Multi-document Summarization System for Malayalam News Documents
COMPSE
EAI
DOI: 10.4108/eai.27-2-2017.152340
Abstract
The flooding of digital data necessitates the need for a system that can take information from multiple documents and provide it in a summarized form. Due to the unavailability of automatic tool for summarizing Malayalam documents, this work serves as an introduction. In this work, we have investigated on an extractive multi document summarizer for Malayalam language which uses a sentence scoring technique. An online Malayalam Wordnet is used in the work for semantic similarity checking. Sentence score is calculated based on the features selected for each sentence. Feature selection is done by considering the heuristic measures like sentence length, sentence position, presence of numerical data, existence of proper noun in a sentence, term frequency-inverse document frequency in the documents. Top ranking sentences are selected as initial summary. Then cosine similarity measure is applied to remove redundancies and the summary is generated as per the length specified. Experimental results demonstrates the effectiveness of the proposed system on the data set selected as bench mark.