A Three-Level Training Data Filter for Cross-project Defect Prediction

Cangzhou Yuan; Xiaowei Wang; Xinxin Ke; Panpan Zhan

Wireless and Satellite Systems. 11th EAI International Conference, WiSATS 2020, Nanjing, China, September 17-18, 2020, Proceedings, Part I

Research Article

A Three-Level Training Data Filter for Cross-project Defect Prediction

Download

47 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.1007/978-3-030-69069-4_10,
    author={Cangzhou Yuan and Xiaowei Wang and Xinxin Ke and Panpan Zhan},
    title={A Three-Level Training Data Filter for Cross-project Defect Prediction},
    proceedings={Wireless and Satellite Systems. 11th EAI International Conference, WiSATS 2020, Nanjing, China, September 17-18, 2020, Proceedings, Part I},
    proceedings_a={WISATS},
    year={2021},
    month={2},
    keywords={Machine learning Cross-project defect prediction Transfer learning},
    doi={10.1007/978-3-030-69069-4_10}
}

Cangzhou Yuan
Xiaowei Wang
Xinxin Ke
Panpan Zhan
Year: 2021
A Three-Level Training Data Filter for Cross-project Defect Prediction
WISATS
Springer
DOI: 10.1007/978-3-030-69069-4_10

Cangzhou Yuan¹^,*, Xiaowei Wang¹, Xinxin Ke¹, Panpan Zhan²

1: School of Software, Beihang University
2: Beijing Institute of Spacecraft System Engineering

*Contact email: yuancz@buaa.edu.cn

Abstract

The purpose of cross-project defect prediction is to predict whether there are defects in this project module by using a prediction model trained by the data of other projects. For the divergence of the data distribution between different projects, the performance of cross-project defect prediction is not as good as within-project defect prediction. To reduce the difference as much as possible, researchers have proposed a variety of methods to filter training data from the perspective of transfer learning. In this paper, we introduce a “project-instance-metric" hierarchical filtering strategy to select training data for the defect prediction model. Using the three-level filtering method, the candidate projects that are most similar to the target project, the instances that are most similar to the target instance, and the metrics with the highest correlation to the prediction result are filtered out respectively. We compared three-level filtering with project-level filtering, instance-level filtering, and the combination of project-level and instance-level filtering methods in four classification algorithms using NASA open source data sets. Our experiments show that the three-level filtering method achieves more significant f-measure and AUC values than the single level training data filtering method.

Keywords: Machine learning, Cross-project defect prediction, Transfer learning

Published: 2021-02-28
Appears in: SpringerLink

: http://dx.doi.org/10.1007/978-3-030-69069-4_10

A Three-Level Training Data Filter for Cross-project Defect Prediction

Abstract

About EAI

Community

Publish with EAI