Proceedings of the 11th International Applied Business and Engineering Conference, ABEC 2023, September 21st, 2023, Bengkalis, Riau, Indonesia

Research Article

Improved Decision Tree Accuracy (C4.5) with Attribute Reduction Using Forward Selection in Data Classification

Download113 downloads
  • @INPROCEEDINGS{10.4108/eai.21-9-2023.2342901,
        author={Raden Mas Rizky Yohannes Cristanto and Elviawaty Muisa Zamzami and Fahmi  Fahmi},
        title={Improved Decision Tree Accuracy (C4.5) with Attribute Reduction Using Forward Selection in Data Classification},
        proceedings={Proceedings of the 11th International Applied Business and Engineering Conference, ABEC 2023, September 21st, 2023, Bengkalis, Riau, Indonesia},
        publisher={EAI},
        proceedings_a={ABEC},
        year={2024},
        month={2},
        keywords={classification decision tree c45 attribute reduction forward selection},
        doi={10.4108/eai.21-9-2023.2342901}
    }
    
  • Raden Mas Rizky Yohannes Cristanto
    Elviawaty Muisa Zamzami
    Fahmi Fahmi
    Year: 2024
    Improved Decision Tree Accuracy (C4.5) with Attribute Reduction Using Forward Selection in Data Classification
    ABEC
    EAI
    DOI: 10.4108/eai.21-9-2023.2342901
Raden Mas Rizky Yohannes Cristanto1,*, Elviawaty Muisa Zamzami1, Fahmi Fahmi2
  • 1: Faculty of Computer Science and Information Technology, University of Sumatera Utara, Medan, Indonesia
  • 2: Electrical Engineering Department, Universitas Sumatera Utara, Indonesia
*Contact email: ramariyocris@gmail.com

Abstract

The main process in the formation of Decision Tree C4.5 is the separation of attributes. However, the attribute separation procedure in C4.5 still cannot optimize prediction accuracy in decision tree formation because unwanted features can lead to noisy data and less relevant features, which in turn can result in very large decision tree sizes (overfitting). As a result, the data becomes unbalanced and the classification accuracy of the Decision Tree C4.5 model becomes lower. To improve the accuracy of the classification process, attribute reduction is performed as a technique to simplify less relevant attributes. Therefore, forward selection is proposed as an attribute reduction method to produce mutually uncorrelated features, which are then used in Decision Tree C4.5 for classification. This study used datasets from the UCI Machine Learning Repository and Kaggle.com namely Diabetic Retinopathy Debrecen and South German Credit. Debrecen's Diabetic Retinopathy consists of 1,151 data records with 20 attributes, while South German Credit consists of 1000 data records with 20 attributes. Evaluation of classification performance is carried out based on the calculation of the Confusion Matrix. The test results showed that the proposed method was able to increase classification accuracy by 7.68%. Therefore, forward selection is considered an effective technique in reducing attributes and improving classification accuracy in Decision Tree C4.5.