Research Article
Experimenting Statistical Machine Translation for Ethiopic Semitic Languages: The Case of Amharic-Tigrigna
@INPROCEEDINGS{10.1007/978-3-319-95153-9_13, author={Michael Woldeyohannis and Million Meshesha}, title={Experimenting Statistical Machine Translation for Ethiopic Semitic Languages: The Case of Amharic-Tigrigna}, proceedings={Information and Communication Technology for Development for Africa. First International Conference, ICT4DA 2017, Bahir Dar, Ethiopia, September 25--27, 2017, Proceedings}, proceedings_a={ICT4DA}, year={2018}, month={7}, keywords={Under-resourced language Amharic-Tigrigna Semitic language Machine translation}, doi={10.1007/978-3-319-95153-9_13} }
- Michael Woldeyohannis
Million Meshesha
Year: 2018
Experimenting Statistical Machine Translation for Ethiopic Semitic Languages: The Case of Amharic-Tigrigna
ICT4DA
Springer
DOI: 10.1007/978-3-319-95153-9_13
Abstract
In this research an attempt have been made to experiment on Amharic-Tigrigna machine translation for promoting information sharing. Since there is no Amharic-Tigrigna parallel text corpus, we prepared a parallel text corpus for Amharic-Tigrigna machine translation system from religious domain specifically from bible. Consequently, the data preparation involves sentence alignment, sentence splitting, tokenization, normalization of Amharic-Tigrigna parallel corpora and then splitting the dataset into training, tuning and testing data. Then, Amharic-Tigrigna translation model have been constructed using training data and further tuned for better translation. Finally, given target language model, the Amharic-Tigrigna translation system generates a target output with reference to translation model using word and morpheme as a unit. The result we found from the experiment is promising to design Amharic-Tigrigna machine translation system between resource deficient languages. We are now working on post-editing to enhance the performance of the bi-lingual Amharic-Tigrigna translator.