
Research Article
Extension of the Hybrid Method for Efficient Imputation of Records with Several Missing Attributes
@INPROCEEDINGS{10.1007/978-3-031-06374-9_17, author={Kone Dramane and Kimou Kouadio Prosper and Goore Bi Tra}, title={Extension of the Hybrid Method for Efficient Imputation of Records with Several Missing Attributes}, proceedings={e-Infrastructure and e-Services for Developing Countries. 13th EAI International Conference, AFRICOMM 2021, Zanzibar, Tanzania, December 1-3, 2021, Proceedings}, proceedings_a={AFRICOMM}, year={2022}, month={5}, keywords={Correlation Discretization Classification Data Quality}, doi={10.1007/978-3-031-06374-9_17} }
- Kone Dramane
Kimou Kouadio Prosper
Goore Bi Tra
Year: 2022
Extension of the Hybrid Method for Efficient Imputation of Records with Several Missing Attributes
AFRICOMM
Springer
DOI: 10.1007/978-3-031-06374-9_17
Abstract
The treatment of records with several discrete missing values present in the databases is still a delicate problem. Indeed, these records can bias the results of data mining algorithms, thus invalidating the results. In this paper, we present an extension of the Hybrid Method for Efficient Imputation of Discrete Missing Attributes (HMID) to effectively handle these records. The method consists of partitioning the database into two subsets, one containing complete records and the other incomplete records. From the complete set, decision trees for all missing discrete attributes are created. The multiple missing records can be in the same leaf or in different leaves. In the same leaf, they are estimated directly by the HMID method. Otherwise, the sheets containing them are merged into a horizontal segment to determine the dominant modality of the complete attributes. In which case, multiple records are estimated. We evaluate our algorithm using two databases. The Adult dataset extracted from the UCI Machine Learning database and SHCDISingle extracted from the World Bank database. Finally, we compare our algorithm with four imputation methods using the accuracy of missing value estimation and RMSE. Our results indicate that the proposed method performs better than the existing algorithms we compared.