Information and Communication Technology for Development for Africa. First International Conference, ICT4DA 2017, Bahir Dar, Ethiopia, September 25–27, 2017, Proceedings

Research Article

A Finite-State Morphological Analyzer for Wolaytta

Download
864 downloads
  • @INPROCEEDINGS{10.1007/978-3-319-95153-9_2,
        author={Tewodros Gebreselassie and Jonathan Washington and Michael Gasser and Baye Yimam},
        title={A Finite-State Morphological Analyzer for Wolaytta},
        proceedings={Information and Communication Technology for Development for Africa. First International Conference, ICT4DA 2017, Bahir Dar, Ethiopia, September 25--27, 2017, Proceedings},
        proceedings_a={ICT4DA},
        year={2018},
        month={7},
        keywords={Wolaytta language Morphological analysis and generation HFST Apertium NLP},
        doi={10.1007/978-3-319-95153-9_2}
    }
    
  • Tewodros Gebreselassie
    Jonathan Washington
    Michael Gasser
    Baye Yimam
    Year: 2018
    A Finite-State Morphological Analyzer for Wolaytta
    ICT4DA
    Springer
    DOI: 10.1007/978-3-319-95153-9_2
Tewodros Gebreselassie1,*, Jonathan Washington2,*, Michael Gasser3,*, Baye Yimam1
  • 1: Addis Ababa University
  • 2: Swarthmore College
  • 3: Indiana University
*Contact email: wolaytta.boditti@gmail.com, jonathan.washington@swarthmore.edu, gasser@indiana.edu

Abstract

This paper presents the development of a free/open-source finite-state morphological transducer for Wolaytta, an Omotic language of Ethiopia, using the Helsinki Finite-State Transducer toolkit (HFST). Developing a full-fledged morphological analysis tool for an under-resourced language like Wolaytta is an important step towards developing further NLP (Natural Language Processing) applications. Morphological analyzers for highly inflectional languages are most efficiently developed using finite-state transducers. To develop the transducer, a lexicon of root words was obtained semi-automatically. The morphotactics of the language were implemented by hand in the lexc formalism, and morphophonological rules were implemented in the twol formalism. Evaluation of the transducer shows as it has decent coverage (over 80%) of forms in a large corpus and exhibits high precision (94.85%) and recall (94.11%) over a manually verified test set. To the best of our knowledge, this work is the first systematic and exhaustive implementation of the morphology of Wolaytta in a morphological transducer.