Experimenting Statistical Machine Translation for Ethiopic Semitic Languages: The Case of Amharic-Tigrigna

Michael Woldeyohannis; Million Meshesha

Information and Communication Technology for Development for Africa. First International Conference, ICT4DA 2017, Bahir Dar, Ethiopia, September 25–27, 2017, Proceedings

Research Article

Experimenting Statistical Machine Translation for Ethiopic Semitic Languages: The Case of Amharic-Tigrigna

Download

1023 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.1007/978-3-319-95153-9_13,
    author={Michael Woldeyohannis and Million Meshesha},
    title={Experimenting Statistical Machine Translation for Ethiopic Semitic Languages: The Case of Amharic-Tigrigna},
    proceedings={Information and Communication Technology for Development for Africa. First International Conference, ICT4DA 2017, Bahir Dar, Ethiopia, September 25--27, 2017, Proceedings},
    proceedings_a={ICT4DA},
    year={2018},
    month={7},
    keywords={Under-resourced language Amharic-Tigrigna Semitic language Machine translation},
    doi={10.1007/978-3-319-95153-9_13}
}

Michael Woldeyohannis
Million Meshesha
Year: 2018
Experimenting Statistical Machine Translation for Ethiopic Semitic Languages: The Case of Amharic-Tigrigna
ICT4DA
Springer
DOI: 10.1007/978-3-319-95153-9_13

Michael Woldeyohannis¹^,*, Million Meshesha¹^,*

1: Addis Ababa University

*Contact email: michael.melese@aau.edu.et, million.meshesha@aau.edu.et

Abstract

In this research an attempt have been made to experiment on Amharic-Tigrigna machine translation for promoting information sharing. Since there is no Amharic-Tigrigna parallel text corpus, we prepared a parallel text corpus for Amharic-Tigrigna machine translation system from religious domain specifically from bible. Consequently, the data preparation involves sentence alignment, sentence splitting, tokenization, normalization of Amharic-Tigrigna parallel corpora and then splitting the dataset into training, tuning and testing data. Then, Amharic-Tigrigna translation model have been constructed using training data and further tuned for better translation. Finally, given target language model, the Amharic-Tigrigna translation system generates a target output with reference to translation model using word and morpheme as a unit. The result we found from the experiment is promising to design Amharic-Tigrigna machine translation system between resource deficient languages. We are now working on post-editing to enhance the performance of the bi-lingual Amharic-Tigrigna translator.

Keywords: Under-resourced language, Amharic-Tigrigna, Semitic language, Machine translation

Published: 2018-07-10
Appears in: SpringerLink

: http://dx.doi.org/10.1007/978-3-319-95153-9_13

Experimenting Statistical Machine Translation for Ethiopic Semitic Languages: The Case of Amharic-Tigrigna

Abstract

About EAI

Community

Publish with EAI