8th International Conference on Bio-inspired Information and Communications Technologies (formerly BIONETICS)

Research Article

An Empirical Comparison of Machine Learning Techniques for Software Defect Prediction

  • @INPROCEEDINGS{10.4108/icst.bict.2014.257871,
        author={Ruchika Malhotra and Rajeev Raje},
        title={An Empirical Comparison of Machine Learning Techniques for Software Defect Prediction},
        proceedings={8th International Conference on Bio-inspired Information and Communications Technologies (formerly BIONETICS)},
        publisher={ICST},
        proceedings_a={BICT},
        year={2015},
        month={2},
        keywords={defect prediction object-oriented metrics machine learning empirical validation},
        doi={10.4108/icst.bict.2014.257871}
    }
    
  • Ruchika Malhotra
    Rajeev Raje
    Year: 2015
    An Empirical Comparison of Machine Learning Techniques for Software Defect Prediction
    BICT
    ACM
    DOI: 10.4108/icst.bict.2014.257871
Ruchika Malhotra1,*, Rajeev Raje1
  • 1: Indiana University Purdue University
*Contact email: ruchmalh@cs.iupui.edu

Abstract

Software systems are exposed to various types of defects. The timely identification of defective classes is essential in early phases of software development to reduce the cost of testing the software. Software metrics can be used in conjunction with defect data to develop models for predicting defective classes. There have been various machine learning techniques proposed in the literature for analyzing complex relationships and extracting useful information from problems in less time. However, more studies comparing these techniques are needed to provide evidence so that confidence is established on the performance of one technique over the other. In this paper we address four issues (i) comparison of the machine learning techniques over unpopular used data sets (ii) use of inappropriate performance measures for measuring the performance of defect prediction models (iii) less use of statistical tests and (iv) validation of models from the same data set from which they are trained. To resolve these issues, in this paper, we compare 18 machine learning techniques for investigating the effect of Object-Oriented metrics on defective classes. The results are validated on six releases of the ‘MMS’ application package of recent widely used mobile operating system – Android. The overall results of the study indicate the predictive capability of the machine learning techniques and an endorsement of one particular ML technique to predict defects.