
Research Article
OCR System for the Recognition of Ethiopic Real-Life Documents
@INPROCEEDINGS{10.1007/978-3-030-93709-6_38, author={Hagos Tesfahun Gebremichael and Tesfahunegn Minwuyelet Mengistu and Million Mesheha Beyene and Fikreselam Gared Mengistu}, title={OCR System for the Recognition of Ethiopic Real-Life Documents}, proceedings={Advances of Science and Technology. 9th EAI International Conference, ICAST 2021, Hybrid Event, Bahir Dar, Ethiopia, August 27--29, 2021, Proceedings, Part I}, proceedings_a={ICAST}, year={2022}, month={1}, keywords={Ethiopic scripts OCR system Gabor filter PCA GA SVM}, doi={10.1007/978-3-030-93709-6_38} }
- Hagos Tesfahun Gebremichael
Tesfahunegn Minwuyelet Mengistu
Million Mesheha Beyene
Fikreselam Gared Mengistu
Year: 2022
OCR System for the Recognition of Ethiopic Real-Life Documents
ICAST
Springer
DOI: 10.1007/978-3-030-93709-6_38
Abstract
A bulk of real-life documents contain vital information and knowledge about history, culture, economy, politics, religion, and science that are written in Ethiopic script. This knowledge has to be shared and the advancement of technology like Optical Character Recognition (OCR) brings the need to digitize documents and make them available for public use. OCR is a process that allows printed, typewritten, and handwritten text to be recognized optically and converted into a machine-readable format that can be accepted by a computer for further processing. Nowadays, effective OCR systems have been developed for languages, like English that has wider use internationally. Researches in the area of Amharic OCR are ongoing since 1997. Attempts were made in adopting recognition algorithms to develop Amharic OCR. This study is, thus, an attempt made to develop an OCR system for real-life documents written in Ethiopic characters. In this study we propose a novel feature extraction schema using Gabor Filter and Principal Component Analysis (PCA), followed by a Genetic Algorithm (GA) based on supported vector machine classifier (SVM). The prototype was tested on real-life Ethiopic documents such as books, newspapers, and magazines, in which an average accuracy of 98.33% for Ethiopic characters is registered.