Research Article
From web to SMS: A text summarization of Wikipedia pages with character limitation
@ARTICLE{10.4108/eai.11-6-2020.165277, author={J.L.E.K Fendji and B.A.H. Aminatou}, title={From web to SMS: A text summarization of Wikipedia pages with character limitation}, journal={EAI Endorsed Transactions on Creative Technologies}, volume={7}, number={24}, publisher={EAI}, journal_a={CT}, year={2020}, month={6}, keywords={Character-limitation summarization, SMS, LSA, TextRank, ROUGE, TACOS, Wikipedia}, doi={10.4108/eai.11-6-2020.165277} }
- J.L.E.K Fendji
B.A.H. Aminatou
Year: 2020
From web to SMS: A text summarization of Wikipedia pages with character limitation
CT
EAI
DOI: 10.4108/eai.11-6-2020.165277
Abstract
Wikipedia is one of the main sources of information on the Web. But the access to this content may be difficult especially when using a basic telephone without browsing capability and only a GSM network. The only means of text-based communication remains through SMS. Due to the limitation of the number of characters, a Wikipedia page cannot always be sent through SMS. This work raises the issue of text summarization with character limitation. To solve this issue, two extractive approaches have been combined: LSA and TextRank algorithms. Generated summaries have been evaluated using ROUGE metrics. Since ROUGE metrics do not consider character limitation, a new threshold named Threshold of Acceptability for Character-Oriented Summaries (TACOS) has been proposed to appreciate ROUGE metrics. The evaluation showed the relevance of the approach for pages of at most 2000 characters. The system has been tested using the SMS simulator of RapidSMS without a GSM gateway to simulate the deployment in a real environment. To the best of our knowledge, this is the first work tackling text summarization issue with character limitation.
Copyright © 2020 J.L.E.K Fendji et al., licensed to EAI. This is an open access article distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/3.0/), which permits unlimited use, distribution and reproduction in any medium so long as the original work is properly cited.