About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
e-Infrastructure and e-Services for Developing Countries. 13th EAI International Conference, AFRICOMM 2021, Zanzibar, Tanzania, December 1-3, 2021, Proceedings

Research Article

On the Entropy of Written Afan Oromo

Download(Requires a free EAI acccount)
8 downloads
Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-031-06374-9_3,
        author={Dereje Hailemariam Woldegebreal and Tsegamlak Terefe Debella and Kalkidan Dejenie Molla},
        title={On the Entropy of Written Afan Oromo},
        proceedings={e-Infrastructure and e-Services for Developing Countries. 13th EAI International Conference, AFRICOMM 2021, Zanzibar, Tanzania, December 1-3, 2021, Proceedings},
        proceedings_a={AFRICOMM},
        year={2022},
        month={5},
        keywords={Compression Entropy Encoding Written Afan Oromo},
        doi={10.1007/978-3-031-06374-9_3}
    }
    
  • Dereje Hailemariam Woldegebreal
    Tsegamlak Terefe Debella
    Kalkidan Dejenie Molla
    Year: 2022
    On the Entropy of Written Afan Oromo
    AFRICOMM
    Springer
    DOI: 10.1007/978-3-031-06374-9_3
Dereje Hailemariam Woldegebreal1,*, Tsegamlak Terefe Debella1, Kalkidan Dejenie Molla2
  • 1: School of Electrical and Computer Engineering (SECE)
  • 2: SECE
*Contact email: dereje.hailemariam@aait.edu.et

Abstract

Afan Oromo is the language of the Oromo people, the largest ethnolinguistic group in Ethiopia. Written Afan Oromo uses Latin alphabet. In electronic communication systems letters in the alphabet are represented with standard ASCII-8 code, which uses 8 bits/letter, or UTF-8 fixed length encoding, which uses 16 bits/letter. Moreover, the language uses gemination (i.e., doubling of a consonant) and long vowels are represented by double letters, e.g., “dammee” to mean sweet potato. From information theoretic perspective, this doubling and fixed length encoding schemes addredundancyin written Afan Oromo. This redundancy, in turn, contributes for inefficient use of communication resources, such as bandwidth and energy, during transmission and storage of texts written in Afan Oromo. This paper aims at utilizing information theory to estimate entropy of written Afan Oromo. We use higher-order Markov chain, also calledN-gram model, to compute the entropy of a sample text corpora (or written source) by capturing the dependencies among sequence of letters generated from the corpora. Entropy measures average information in bits per letter or block of letters, depending on theN-gram considered. Entropy also indicates the achievable lower bound for compression when using lossless compressions such as Huffman coding. When modeled as a first order Markov chain (i.e., assumingmemorylesssource where sequence of letters from the source are occurring independent of each other), the entropy of the language is 4.31 bits/letter. When compared with ASCII-8, the achievable compression level is about 46%. WhenN= 19 the estimated entropy is as low as 0.85 bits/letter; this corresponds to about 89% compression level. Huffman and Arithmetic source coding algorithms are implemented to check the achievable compression level. For the collected sample corpora, the average compression by Huffman algorithm varies from 42.2%−64.9% forN= 1 − 5. These compression levels are closer to the theoretical entropy. With increasing demand of the language in telecom services and storage systems, the entropy results show the need to further investigate language specific applications, like compression algorithms.

Keywords
Compression Entropy Encoding Written Afan Oromo
Published
2022-05-26
Appears in
SpringerLink
http://dx.doi.org/10.1007/978-3-031-06374-9_3
Copyright © 2021–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL