About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Advances of Science and Technology. 8th EAI International Conference, ICAST 2020, Bahir Dar, Ethiopia, October 2-4, 2020, Proceedings, Part I

Research Article

Construction of Morpheme-Based Amharic Stopword List for Information Retrieval System

Download(Requires a free EAI acccount)
3 downloads
Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-030-80621-7_35,
        author={Tilahun Yeshambel and Josiane Mothe and Yaregal Assabie},
        title={Construction of Morpheme-Based Amharic Stopword List for Information Retrieval System},
        proceedings={Advances of Science and Technology. 8th EAI International Conference, ICAST 2020, Bahir Dar, Ethiopia, October 2-4, 2020, Proceedings, Part I},
        proceedings_a={ICAST},
        year={2021},
        month={7},
        keywords={Morphological analysis Corpus statistics Semantics Complex-language Amharic Stopword},
        doi={10.1007/978-3-030-80621-7_35}
    }
    
  • Tilahun Yeshambel
    Josiane Mothe
    Yaregal Assabie
    Year: 2021
    Construction of Morpheme-Based Amharic Stopword List for Information Retrieval System
    ICAST
    Springer
    DOI: 10.1007/978-3-030-80621-7_35
Tilahun Yeshambel1,*, Josiane Mothe2, Yaregal Assabie3
  • 1: IT PhD Program
  • 2: INSPE, Univ. de Toulouse, IRIT
  • 3: Department of Computer Science
*Contact email: tilahun.yeshambel@uog.edu.et

Abstract

One of the major forms of pre-processing in information retrieval and many other text processing applications is filtering out stopwords. They are ignored by many retrieval systems during indexing and retrieval in order to enhance retrieval effectiveness and efficiency. The aim of this paper is to present the construction of morpheme-based Amharic stopwords and investigate their effect on information retrieval tasks. The stopword list is constructed based on the semantics of Amharic words and corpus statistics: frequency, mean, variance, and entropy parameters. The stopword list is evaluated using Lemur on Amharic information retrieval test collection. Removal of stopwords has shown significant impact on retrieval effectiveness, size of index and term weighting of non-stopwords. On the other hand, their presence in index and query negatively affects the retrieval effectiveness of Amharic retrieval system. The average precisions of retrieving with and without stopwords using language modeling on root-based approach are 0.24 and 0.70, respectively.

Keywords
Morphological analysis Corpus statistics Semantics Complex-language Amharic Stopword
Published
2021-07-15
Appears in
SpringerLink
http://dx.doi.org/10.1007/978-3-030-80621-7_35
Copyright © 2020–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL