About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Artificial Intelligence and Digitalization for Sustainable Development. 10th EAI International Conference, ICAST 2022, Bahir Dar, Ethiopia, November 4-6, 2022, Proceedings

Research Article

Amharic Sentence-Level Word Sense DisambiguationUsing TransferLearning

Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-031-28725-1_14,
        author={Neima Mossa and Million Meshesha},
        title={Amharic Sentence-Level Word Sense DisambiguationUsing TransferLearning},
        proceedings={Artificial Intelligence and Digitalization for Sustainable Development. 10th EAI International Conference, ICAST 2022, Bahir Dar, Ethiopia, November 4-6, 2022, Proceedings},
        proceedings_a={ICAST},
        year={2023},
        month={3},
        keywords={Word sense disambiguation Transfer learning Neural network Pre-trained language model Natural language preprocessing Morphological analyzer Amharic WSD},
        doi={10.1007/978-3-031-28725-1_14}
    }
    
  • Neima Mossa
    Million Meshesha
    Year: 2023
    Amharic Sentence-Level Word Sense DisambiguationUsing TransferLearning
    ICAST
    Springer
    DOI: 10.1007/978-3-031-28725-1_14
Neima Mossa1,*, Million Meshesha2
  • 1: Faculty of Computing, Bahir Dar Institute of Technology
  • 2: School of Information Science
*Contact email: neimamussa32@gmail.com

Abstract

Word sense disambiguation (WSD) plays an important role, in increasing the performance of NLP applications such as information extraction, information retrieval, and machine translation. The manual disambiguation process by humans is tedious, prone to errors, and expensive. Recent research in Amharic WSD used mostly handcrafted rules. Such works do not help to learn different representations of the target word from data automatically. Moreover, such a manual disambiguation approach looks at a limited length of surrounding words from the sentence. The main drawback of previous works is that the sense of the word will not be detected from the synset list unless the word is explicitly mentioned. Our study explores and designs the Amharic WSD model by employing transformer-based contextual embeddings, namely AmRoBERTa. As there is no standard sense-tagged Amharic text dataset for the Amharic WSD task, we first compiled 800 ambiguous words. Furthermore, we collect more than 33k sentences that contain those ambiguous words. The 33k sentences are used to finetune our transformer based AmRoBERTa model.We conduct two types of annotation for our WSD experiments. First, using linguistic experts, we annotate 10k sentences for 7 types of word relations (synonymy, hyponymy, hypernymy, meronomy, holonomy, toponymy, and homonymy). For the WSD disambiguation experiment, we first choose 10 target words and annotate a total of 1000 sentences with their correct sense using the WebAnno annotation tool. For the classification task, the CNN, Bi-LSTM, and BERT-based classification models achieve an accuracy of 90%, 88%, and 93% respectively. For the WSD task, we have employed two experiments. When we use the masking technique of the pre-trained contextual embedding to find the correct sense, it attains 70% accuracy. However, when we use the FLAIR document embedding framework to embed the target sentences and glosses separately and compute the similarities, our model was able to achieve 71% accuracy to correctly disambiguate target words.

Keywords
Word sense disambiguation Transfer learning Neural network Pre-trained language model Natural language preprocessing Morphological analyzer Amharic WSD
Published
2023-03-19
Appears in
SpringerLink
http://dx.doi.org/10.1007/978-3-031-28725-1_14
Copyright © 2022–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL