About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
e-Infrastructure and e-Services for Developing Countries. 13th EAI International Conference, AFRICOMM 2021, Zanzibar, Tanzania, December 1-3, 2021, Proceedings

Research Article

Extension of the Hybrid Method for Efficient Imputation of Records with Several Missing Attributes

Download(Requires a free EAI acccount)
5 downloads
Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-031-06374-9_17,
        author={Kone Dramane and Kimou Kouadio Prosper and Goore Bi Tra},
        title={Extension of the Hybrid Method for Efficient Imputation of Records with Several Missing Attributes},
        proceedings={e-Infrastructure and e-Services for Developing Countries. 13th EAI International Conference, AFRICOMM 2021, Zanzibar, Tanzania, December 1-3, 2021, Proceedings},
        proceedings_a={AFRICOMM},
        year={2022},
        month={5},
        keywords={Correlation Discretization Classification Data Quality},
        doi={10.1007/978-3-031-06374-9_17}
    }
    
  • Kone Dramane
    Kimou Kouadio Prosper
    Goore Bi Tra
    Year: 2022
    Extension of the Hybrid Method for Efficient Imputation of Records with Several Missing Attributes
    AFRICOMM
    Springer
    DOI: 10.1007/978-3-031-06374-9_17
Kone Dramane1,*, Kimou Kouadio Prosper1, Goore Bi Tra1
  • 1: Computer and Telecommunications Research Laboratory: LARIT
*Contact email: dramane.kone18@inphb.ci

Abstract

The treatment of records with several discrete missing values present in the databases is still a delicate problem. Indeed, these records can bias the results of data mining algorithms, thus invalidating the results. In this paper, we present an extension of the Hybrid Method for Efficient Imputation of Discrete Missing Attributes (HMID) to effectively handle these records. The method consists of partitioning the database into two subsets, one containing complete records and the other incomplete records. From the complete set, decision trees for all missing discrete attributes are created. The multiple missing records can be in the same leaf or in different leaves. In the same leaf, they are estimated directly by the HMID method. Otherwise, the sheets containing them are merged into a horizontal segment to determine the dominant modality of the complete attributes. In which case, multiple records are estimated. We evaluate our algorithm using two databases. The Adult dataset extracted from the UCI Machine Learning database and SHCDISingle extracted from the World Bank database. Finally, we compare our algorithm with four imputation methods using the accuracy of missing value estimation and RMSE. Our results indicate that the proposed method performs better than the existing algorithms we compared.

Keywords
Correlation Discretization Classification Data Quality
Published
2022-05-26
Appears in
SpringerLink
http://dx.doi.org/10.1007/978-3-031-06374-9_17
Copyright © 2021–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL