Proceedings of the 1st International Conference on Statistics and Analytics, ICSA 2019, 2-3 August 2019, Bogor, Indonesia

Research Article

Bayes Risk Post-Pruning in Decision Tree to Overcome Overfitting Problem on Customer Churn Classification

Download29 downloads
  • @INPROCEEDINGS{10.4108/eai.2-8-2019.2290487,
        author={Devina  Christianti and Sarini  Abdullah and Siti  Nurrohmah},
        title={Bayes Risk Post-Pruning in Decision Tree to Overcome Overfitting Problem on Customer Churn Classification},
        proceedings={Proceedings of the 1st International Conference on Statistics and Analytics, ICSA 2019, 2-3 August 2019, Bogor, Indonesia},
        publisher={EAI},
        proceedings_a={ICSA},
        year={2020},
        month={1},
        keywords={binary class c45 algorithm entropy gain ratio zero-one-loss},
        doi={10.4108/eai.2-8-2019.2290487}
    }
    
  • Devina Christianti
    Sarini Abdullah
    Siti Nurrohmah
    Year: 2020
    Bayes Risk Post-Pruning in Decision Tree to Overcome Overfitting Problem on Customer Churn Classification
    ICSA
    EAI
    DOI: 10.4108/eai.2-8-2019.2290487
Devina Christianti1,*, Sarini Abdullah1, Siti Nurrohmah1
  • 1: Department of Mathematics, Universitas Indonesia, Depok, Indonesia
*Contact email: devina.christianti@sci.ui.ac.id

Abstract

Classification is the process of assigning a set of data into an existing class. Decision tree is claimed to be faster and produces better accuracy compared to another classifier. However, it has some drawbacks in which the classifier is susceptible to overfitting. This problem can be avoided by post-pruning that trimming the small influence subtree in conducting the classification to improve model performance in predicting data. This paper proposes a Post-Pruning method by applying Bayes Risk, in which the risk estimation of each parent node compared with its leaf. This method is applied to two datasets of customer churn classification from the Kaggle site and IBM Datasets with three different sizes for training dataset (60%, 70%, and 80%). For the result, Bayes Risk Post-Pruning can improve decision tree performance and the larger the size of the training dataset was associated with higher accuracy, precision, and recall of the model.