IoT as a Service. 4th EAI International Conference, IoTaaS 2018, Xi’an, China, November 17–18, 2018, Proceedings

Research Article

Missing Data Imputation for Machine Learning

Download
235 downloads
  • @INPROCEEDINGS{10.1007/978-3-030-14657-3_7,
        author={Shaoqian Wang and Bo Li and Mao Yang and Zhongjiang Yan},
        title={Missing Data Imputation for Machine Learning},
        proceedings={IoT as a Service. 4th EAI International Conference, IoTaaS 2018, Xi’an, China, November 17--18, 2018, Proceedings},
        proceedings_a={IOTAAS},
        year={2019},
        month={3},
        keywords={Data imputation Machine learning Artificial Neural Network},
        doi={10.1007/978-3-030-14657-3_7}
    }
    
  • Shaoqian Wang
    Bo Li
    Mao Yang
    Zhongjiang Yan
    Year: 2019
    Missing Data Imputation for Machine Learning
    IOTAAS
    Springer
    DOI: 10.1007/978-3-030-14657-3_7
Shaoqian Wang1,*, Bo Li1,*, Mao Yang1,*, Zhongjiang Yan1,*
  • 1: Northwestern Polytechnical University
*Contact email: wangshaoqian@mail.nwpu.edu.cn, libo.npu@nwpu.edu.cn, yangmao@nwpu.edu.cn, zhjyan@nwpu.edu.cn

Abstract

The imputation of missing values in datasets always plays an important role in the data preprocessing. In the process of data collection, because of the various reasons, the datasets often contain some missing values, and the excellent missing data imputation algorithms can increase the reliability of the dataset and reduce the impact of missing values on the whole dataset. In this paper, based on the Artificial Neural Network (ANN), we propose a missing data imputation method for the classification-type datasets. For each record which contains missing values, we make a list of the values that can be used to replace the missing data from the complete dataset. Our ANN model uses the complete records as the train dataset, and selects the most appropriate value in the list as the final result based on the label categories of the missing data. In our experiments, we compare our algorithm with the traditional single value imputation method and mean value imputation method with the Pima dataset. The result shows that our proposed algorithm can achieve better classification results when there are more missing values in the dataset.