Bio-Inspired Models of Network, Information, and Computing Systems. 5th International ICST Conference, BIONETICS 2010, Boston, USA, December 1-3, 2010, Revised Selected Papers

Research Article

An Empirical Study of Predictive Modeling Techniques of Software Quality

Download
540 downloads
  • @INPROCEEDINGS{10.1007/978-3-642-32615-8_29,
        author={Taghi Khoshgoftaar and Kehan Gao and Amri Napolitano},
        title={An Empirical Study of Predictive Modeling Techniques of Software Quality},
        proceedings={Bio-Inspired Models of Network, Information, and Computing Systems. 5th International ICST Conference, BIONETICS 2010, Boston, USA, December 1-3, 2010, Revised Selected Papers},
        proceedings_a={BIONETICS},
        year={2012},
        month={10},
        keywords={filter-based feature ranking techniques software defect prediction software metrics software quality},
        doi={10.1007/978-3-642-32615-8_29}
    }
    
  • Taghi Khoshgoftaar
    Kehan Gao
    Amri Napolitano
    Year: 2012
    An Empirical Study of Predictive Modeling Techniques of Software Quality
    BIONETICS
    Springer
    DOI: 10.1007/978-3-642-32615-8_29
Taghi Khoshgoftaar1,*, Kehan Gao2,*, Amri Napolitano1,*
  • 1: Florida Atlantic University
  • 2: Eastern Connecticut State University
*Contact email: taghi@cse.fau.edu, gaok@easternct.edu, amrifau@gmail.com

Abstract

The primary goal of software quality engineering is to apply various techniques and processes to produce a high quality software product. One strategy is applying data mining techniques to software metrics and defect data collected during the software development process to identify the potential low-quality program modules. In this paper, we investigate the use of feature selection in the context of software quality estimation (also referred to as software defect prediction), where a classification model is used to predict program modules (instances) as fault-prone or not-fault-prone. Seven filter-based feature ranking techniques are examined. Among them, six are commonly used, and the other one, named (SNR), is rarely employed. The objective of the paper is to compare these seven techniques for various software data sets and assess their effectiveness for software quality modeling. A case study is performed on 16 software data sets and classification models are built with five different learners. Our experimental results are summarized based on statistical tests for significance. The main conclusion is that the SNR technique performs better than or similar to the best performer of the six commonly used techniques.