Handling Missing Values for the CN2 Algorithm

Cuong Nguyen; Phuong-Tuan Tran; Thi-Thanh-Thao Thai

Context-Aware Systems and Applications, and Nature of Computation and Communication. 7th EAI International Conference, ICCASA 2018, and 4th EAI International Conference, ICTCC 2018, Viet Tri City, Vietnam, November 22–23, 2018, Proceedings

Research Article

Handling Missing Values for the CN2 Algorithm

Download

539 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.1007/978-3-030-06152-4_20,
    author={Cuong Nguyen and Phuong-Tuan Tran and Thi-Thanh-Thao Thai},
    title={Handling Missing Values for the CN2 Algorithm},
    proceedings={Context-Aware Systems and Applications, and Nature of Computation and Communication. 7th EAI International Conference, ICCASA 2018, and 4th EAI International Conference, ICTCC 2018, Viet Tri City, Vietnam, November 22--23, 2018, Proceedings},
    proceedings_a={ICCASA \& ICTCC},
    year={2019},
    month={1},
    keywords={CN2 Missing value Rule induction Data imputation},
    doi={10.1007/978-3-030-06152-4_20}
}

Cuong Nguyen
Phuong-Tuan Tran
Thi-Thanh-Thao Thai
Year: 2019
Handling Missing Values for the CN2 Algorithm
ICCASA & ICTCC
Springer
DOI: 10.1007/978-3-030-06152-4_20

Cuong Nguyen¹^,*, Phuong-Tuan Tran¹^,*, Thi-Thanh-Thao Thai¹^,*

1: HCMC University of Foreign Languages - Information Technology

*Contact email: cuong.nd@huflit.edu.vn, tuantranphuong@huflit.edu.vn, thao.ttt@huflit.edu.vn

Abstract

Missing values are existed in several practical data sets. Machine Learning algorithms, such as CN2, require missing values in a data set be pre-processed. The estimated values of a missing value can be provided by Data Imputation methods. However, the data imputation can introduce unexpected information to the data set so that it can reduce the accuracy of Rule Induction algorithms. If missing values can be directly processed in Rule Induction algorithms, the overall performance can be improved. The paper studied the CN2 algorithm to propose a modified version, CN2MV, which is able to directly process missing values without preprocessing. Testing on 17 benchmarking data sets from the UCI Machine Learning Repository, CN2MV outperforms the original algorithm using data imputations.

Keywords: CN2, Missing value, Rule induction, Data imputation

Published: 2019-01-04
Appears in: SpringerLink

: http://dx.doi.org/10.1007/978-3-030-06152-4_20

Handling Missing Values for the CN2 Algorithm

Abstract

About EAI

Community

Publish with EAI