Research Article
Improved Decision Tree Accuracy (C4.5) with Attribute Reduction Using Forward Selection in Data Classification
@INPROCEEDINGS{10.4108/eai.21-9-2023.2342901, author={Raden Mas Rizky Yohannes Cristanto and Elviawaty Muisa Zamzami and Fahmi Fahmi}, title={Improved Decision Tree Accuracy (C4.5) with Attribute Reduction Using Forward Selection in Data Classification}, proceedings={Proceedings of the 11th International Applied Business and Engineering Conference, ABEC 2023, September 21st, 2023, Bengkalis, Riau, Indonesia}, publisher={EAI}, proceedings_a={ABEC}, year={2024}, month={2}, keywords={classification decision tree c45 attribute reduction forward selection}, doi={10.4108/eai.21-9-2023.2342901} }
- Raden Mas Rizky Yohannes Cristanto
Elviawaty Muisa Zamzami
Fahmi Fahmi
Year: 2024
Improved Decision Tree Accuracy (C4.5) with Attribute Reduction Using Forward Selection in Data Classification
ABEC
EAI
DOI: 10.4108/eai.21-9-2023.2342901
Abstract
The main process in the formation of Decision Tree C4.5 is the separation of attributes. However, the attribute separation procedure in C4.5 still cannot optimize prediction accuracy in decision tree formation because unwanted features can lead to noisy data and less relevant features, which in turn can result in very large decision tree sizes (overfitting). As a result, the data becomes unbalanced and the classification accuracy of the Decision Tree C4.5 model becomes lower. To improve the accuracy of the classification process, attribute reduction is performed as a technique to simplify less relevant attributes. Therefore, forward selection is proposed as an attribute reduction method to produce mutually uncorrelated features, which are then used in Decision Tree C4.5 for classification. This study used datasets from the UCI Machine Learning Repository and Kaggle.com namely Diabetic Retinopathy Debrecen and South German Credit. Debrecen's Diabetic Retinopathy consists of 1,151 data records with 20 attributes, while South German Credit consists of 1000 data records with 20 attributes. Evaluation of classification performance is carried out based on the calculation of the Confusion Matrix. The test results showed that the proposed method was able to increase classification accuracy by 7.68%. Therefore, forward selection is considered an effective technique in reducing attributes and improving classification accuracy in Decision Tree C4.5.