Towards a Framework for the Preparation of High Quality Data for Use by Machine Learning Algorithms

Rasidatou Nabi; Yaya Traoré; Julie Thiombiano

Towards new e-Infrastructure and e-Services for Developing Countries. 15th International Conference, AFRICOMM 2023, Bobo-Dioulasso, Burkina Faso, November 23–25, 2023, Proceedings, Part II

Research Article

Towards a Framework for the Preparation of High Quality Data for Use by Machine Learning Algorithms

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.1007/978-3-031-81573-7_12,
    author={Rasidatou Nabi and Yaya Traor\^{e} and Julie Thiombiano},
    title={Towards a Framework for the Preparation of High Quality Data for Use by Machine Learning Algorithms},
    proceedings={Towards new e-Infrastructure and e-Services for Developing Countries. 15th International Conference, AFRICOMM 2023, Bobo-Dioulasso, Burkina Faso, November 23--25, 2023, Proceedings, Part II},
    proceedings_a={AFRICOMM PART 2},
    year={2025},
    month={2},
    keywords={Data processing Quality data Missing data Encoding Normalization},
    doi={10.1007/978-3-031-81573-7_12}
}

Rasidatou Nabi
Yaya Traoré
Julie Thiombiano
Year: 2025
Towards a Framework for the Preparation of High Quality Data for Use by Machine Learning Algorithms
AFRICOMM PART 2
Springer
DOI: 10.1007/978-3-031-81573-7_12

Rasidatou Nabi^,*, Yaya Traoré, Julie Thiombiano

*Contact email: rasidatou.nabi@ujkz.bf

Abstract

Nowadays, companies and organizations have access to various data collection tools that enable them to amass vast amounts of data, which can be stored in databases. This data can be leveraged by machine learning algorithms to extract valuable information for decision-makers. However, this raw data is often of poor quality, containing errors such as missing data and outliers, requiring the intervention of technicians and domain specialists to prepare the data to ensure the(F1_Score )of the analysis. This article proposes a framework for preparing high-quality data for machine learning algorithms, as manually identifying reliable data from a large pool can be challenging and time-consuming. Our approach is an architectural method that combines data preparation techniques to generate dataset quality.

Keywords: Data processing, Quality data, Missing data, Encoding, Normalization

Published: 2025-02-13
Appears in: SpringerLink

: http://dx.doi.org/10.1007/978-3-031-81573-7_12

Towards a Framework for the Preparation of High Quality Data for Use by Machine Learning Algorithms

Abstract

About EAI

Community

Publish with EAI