Information and Communication Technology for Development for Africa. First International Conference, ICT4DA 2017, Bahir Dar, Ethiopia, September 25–27, 2017, Proceedings

Research Article

Experimenting Statistical Machine Translation for Ethiopic Semitic Languages: The Case of Amharic-Tigrigna

Download
517 downloads
  • @INPROCEEDINGS{10.1007/978-3-319-95153-9_13,
        author={Michael Woldeyohannis and Million Meshesha},
        title={Experimenting Statistical Machine Translation for Ethiopic Semitic Languages: The Case of Amharic-Tigrigna},
        proceedings={Information and Communication Technology for Development for Africa. First International Conference, ICT4DA 2017, Bahir Dar, Ethiopia, September 25--27, 2017, Proceedings},
        proceedings_a={ICT4DA},
        year={2018},
        month={7},
        keywords={Under-resourced language Amharic-Tigrigna Semitic language Machine translation},
        doi={10.1007/978-3-319-95153-9_13}
    }
    
  • Michael Woldeyohannis
    Million Meshesha
    Year: 2018
    Experimenting Statistical Machine Translation for Ethiopic Semitic Languages: The Case of Amharic-Tigrigna
    ICT4DA
    Springer
    DOI: 10.1007/978-3-319-95153-9_13
Michael Woldeyohannis1,*, Million Meshesha1,*
  • 1: Addis Ababa University
*Contact email: michael.melese@aau.edu.et, million.meshesha@aau.edu.et

Abstract

In this research an attempt have been made to experiment on Amharic-Tigrigna machine translation for promoting information sharing. Since there is no Amharic-Tigrigna parallel text corpus, we prepared a parallel text corpus for Amharic-Tigrigna machine translation system from religious domain specifically from bible. Consequently, the data preparation involves sentence alignment, sentence splitting, tokenization, normalization of Amharic-Tigrigna parallel corpora and then splitting the dataset into training, tuning and testing data. Then, Amharic-Tigrigna translation model have been constructed using training data and further tuned for better translation. Finally, given target language model, the Amharic-Tigrigna translation system generates a target output with reference to translation model using word and morpheme as a unit. The result we found from the experiment is promising to design Amharic-Tigrigna machine translation system between resource deficient languages. We are now working on post-editing to enhance the performance of the bi-lingual Amharic-Tigrigna translator.