
Research Article
Construction of Morpheme-Based Amharic Stopword List for Information Retrieval System
@INPROCEEDINGS{10.1007/978-3-030-80621-7_35, author={Tilahun Yeshambel and Josiane Mothe and Yaregal Assabie}, title={Construction of Morpheme-Based Amharic Stopword List for Information Retrieval System}, proceedings={Advances of Science and Technology. 8th EAI International Conference, ICAST 2020, Bahir Dar, Ethiopia, October 2-4, 2020, Proceedings, Part I}, proceedings_a={ICAST}, year={2021}, month={7}, keywords={Morphological analysis Corpus statistics Semantics Complex-language Amharic Stopword}, doi={10.1007/978-3-030-80621-7_35} }
- Tilahun Yeshambel
Josiane Mothe
Yaregal Assabie
Year: 2021
Construction of Morpheme-Based Amharic Stopword List for Information Retrieval System
ICAST
Springer
DOI: 10.1007/978-3-030-80621-7_35
Abstract
One of the major forms of pre-processing in information retrieval and many other text processing applications is filtering out stopwords. They are ignored by many retrieval systems during indexing and retrieval in order to enhance retrieval effectiveness and efficiency. The aim of this paper is to present the construction of morpheme-based Amharic stopwords and investigate their effect on information retrieval tasks. The stopword list is constructed based on the semantics of Amharic words and corpus statistics: frequency, mean, variance, and entropy parameters. The stopword list is evaluated using Lemur on Amharic information retrieval test collection. Removal of stopwords has shown significant impact on retrieval effectiveness, size of index and term weighting of non-stopwords. On the other hand, their presence in index and query negatively affects the retrieval effectiveness of Amharic retrieval system. The average precisions of retrieving with and without stopwords using language modeling on root-based approach are 0.24 and 0.70, respectively.