
Research Article
An Empirical Study of Model-Agnostic Interpretation Technique for Just-in-Time Software Defect Prediction
@INPROCEEDINGS{10.1007/978-3-030-92635-9_25, author={Xingguang Yang and Huiqun Yu and Guisheng Fan and Zijie Huang and Kang Yang and Ziyi Zhou}, title={An Empirical Study of Model-Agnostic Interpretation Technique for Just-in-Time Software Defect Prediction}, proceedings={Collaborative Computing: Networking, Applications and Worksharing. 17th EAI International Conference, CollaborateCom 2021, Virtual Event, October 16-18, 2021, Proceedings, Part I}, proceedings_a={COLLABORATECOM}, year={2022}, month={1}, keywords={Software defect prediction Just-in-time Classifier-agnostic interpretation Model interpretation}, doi={10.1007/978-3-030-92635-9_25} }
- Xingguang Yang
Huiqun Yu
Guisheng Fan
Zijie Huang
Kang Yang
Ziyi Zhou
Year: 2022
An Empirical Study of Model-Agnostic Interpretation Technique for Just-in-Time Software Defect Prediction
COLLABORATECOM
Springer
DOI: 10.1007/978-3-030-92635-9_25
Abstract
Just-in-time software defect prediction (JIT-SDP) is an effective method of software quality assurance, whose objective is to use machine learning methods to identify defective code changes. However, the existing research only focuses on the predictive power of the JIT-SDP model and ignores the interpretability of the model. The need for the interpretability of the JIT-SDP model mainly comes from two reasons: (1) developers expect to understand the decision-making process of the JIT-SDP model and obtain guidance and insights; (2) the prediction results of the JIT-SDP model will have an impact on the interests of developers. According to privacy protection laws, prediction models need to provide explanations. To this end, we introduced three classifier-agnostic (CA) technologies, LIME, BreakDown, and SHAP for JIT-SDP models, and conducted a large-scale empirical study on six open source projects. The empirical results show that: (1) Different instances have different explanations. On average, the feature ranking difference of two random instances is 3; (2) For a given system, the feature lists and top-1 feature generated by different CA technologies have strong agreement; However, CA technologies have small agreement on the top-3 features in the feature ranking lists. In the actual software development process, we suggest using CA technologies to help developers understand the prediction results of the model.