Research Article
Accurate Decision Tree with Cost Constraints
@INPROCEEDINGS{10.1007/978-3-319-73317-3_19, author={Nan Wang and Jinbao Li and Yong Liu and Jinghua Zhu and Jiaxuan Su and Cheng Peng}, title={Accurate Decision Tree with Cost Constraints}, proceedings={Advanced Hybrid Information Processing. First International Conference, ADHIP 2017, Harbin, China, July 17--18, 2017, Proceedings}, proceedings_a={ADHIP}, year={2018}, month={2}, keywords={Decision tree Cost constraint Machine learning Algorithm of classification}, doi={10.1007/978-3-319-73317-3_19} }
- Nan Wang
Jinbao Li
Yong Liu
Jinghua Zhu
Jiaxuan Su
Cheng Peng
Year: 2018
Accurate Decision Tree with Cost Constraints
ADHIP
Springer
DOI: 10.1007/978-3-319-73317-3_19
Abstract
A decision tree is a basic classification and regression method that uses a tree structure or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. Decision tree is an effective approach for classification. At the same time, it is also a way to display an algorithm. It serving as a classical algorithm of classification has many optimization algorithms. Even though these approaches achieve high performance, the acquirement costs of attributes are usually ignored. In some cases, the acquired costs are very different and important, the acquirement cost of attributes in decision tree could not be ignored. Existing construction approaches of cost-sensitive decision tree fail to generate the decision tree dynamically according to the given data object and cost constraint. In this paper, we attempt to solve this problem. We propose a global decision tree as the model. The proper decision tree is derived from the model dynamically according to the data object and cost constraint. For the generation of dynamic decision trees, we propose the cost-constraint-based pruning algorithm. Experimental results demonstrate that our approach outperforms C4.5 in both accuracy and cost. Even though the attribute acquirement cost in our approach is much smaller, the accuracy gap between our approach and C4.5 is also small. Additionally, for large data set, our approach outperforms C4.5 algorithm in both cost and accuracy.