About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Towards new e-Infrastructure and e-Services for Developing Countries. 15th International Conference, AFRICOMM 2023, Bobo-Dioulasso, Burkina Faso, November 23–25, 2023, Proceedings, Part II

Research Article

Towards a Framework for the Preparation of High Quality Data for Use by Machine Learning Algorithms

Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-031-81573-7_12,
        author={Rasidatou Nabi and Yaya Traor\^{e} and Julie Thiombiano},
        title={Towards a Framework for the Preparation of High Quality Data for Use by Machine Learning Algorithms},
        proceedings={Towards new e-Infrastructure and e-Services for Developing Countries. 15th International Conference, AFRICOMM 2023, Bobo-Dioulasso, Burkina Faso, November 23--25, 2023, Proceedings, Part II},
        proceedings_a={AFRICOMM PART 2},
        year={2025},
        month={2},
        keywords={Data processing Quality data Missing data Encoding Normalization},
        doi={10.1007/978-3-031-81573-7_12}
    }
    
  • Rasidatou Nabi
    Yaya Traoré
    Julie Thiombiano
    Year: 2025
    Towards a Framework for the Preparation of High Quality Data for Use by Machine Learning Algorithms
    AFRICOMM PART 2
    Springer
    DOI: 10.1007/978-3-031-81573-7_12
Rasidatou Nabi,*, Yaya Traoré, Julie Thiombiano
    *Contact email: rasidatou.nabi@ujkz.bf

    Abstract

    Nowadays, companies and organizations have access to various data collection tools that enable them to amass vast amounts of data, which can be stored in databases. This data can be leveraged by machine learning algorithms to extract valuable information for decision-makers. However, this raw data is often of poor quality, containing errors such as missing data and outliers, requiring the intervention of technicians and domain specialists to prepare the data to ensure the(F1_Score )of the analysis. This article proposes a framework for preparing high-quality data for machine learning algorithms, as manually identifying reliable data from a large pool can be challenging and time-consuming. Our approach is an architectural method that combines data preparation techniques to generate dataset quality.

    Keywords
    Data processing Quality data Missing data Encoding Normalization
    Published
    2025-02-13
    Appears in
    SpringerLink
    http://dx.doi.org/10.1007/978-3-031-81573-7_12
    Copyright © 2023–2025 ICST
    EBSCOProQuestDBLPDOAJPortico
    EAI Logo

    About EAI

    • Who We Are
    • Leadership
    • Research Areas
    • Partners
    • Media Center

    Community

    • Membership
    • Conference
    • Recognition
    • Sponsor Us

    Publish with EAI

    • Publishing
    • Journals
    • Proceedings
    • Books
    • EUDL