Security and Privacy in Communication Networks. 9th International ICST Conference, SecureComm 2013, Sydney, NSW, Australia, September 25-28, 2013, Revised Selected Papers

Research Article

Clonewise – Detecting Package-Level Clones Using Machine Learning

Download
752 downloads
  • @INPROCEEDINGS{10.1007/978-3-319-04283-1_13,
        author={Silvio Cesare and Yang Xiang and Jun Zhang},
        title={Clonewise -- Detecting Package-Level Clones Using Machine Learning},
        proceedings={Security and Privacy in Communication Networks. 9th International ICST Conference, SecureComm 2013, Sydney, NSW, Australia, September 25-28, 2013, Revised Selected Papers},
        proceedings_a={SECURECOMM},
        year={2014},
        month={6},
        keywords={Vulnerability detection code clone Linux},
        doi={10.1007/978-3-319-04283-1_13}
    }
    
  • Silvio Cesare
    Yang Xiang
    Jun Zhang
    Year: 2014
    Clonewise – Detecting Package-Level Clones Using Machine Learning
    SECURECOMM
    Springer
    DOI: 10.1007/978-3-319-04283-1_13
Silvio Cesare1,*, Yang Xiang1,*, Jun Zhang1,*
  • 1: Deakin University
*Contact email: scesare@deakin.edu.au, yang@deakin.edu.au, jun.zhang@deakin.edu.au

Abstract

Developers sometimes maintain an internal copy of another software or fork development of an existing project. This practice can lead to software vulnerabilities when the embedded code is not kept up to date with upstream sources. We propose an automated solution to identify clones of packages without any prior knowledge of these relationships. We then correlate clones with vulnerability information to identify outstanding security problems. This approach motivates software maintainers to avoid using cloned packages and link against system wide libraries. We propose over 30 novel features that enable us to use to use pattern classification to accurately identify package-level clones. To our knowledge, we are the first to consider clone detection as a classification problem. Our results show our system, Clonewise, compares well to manually tracked databases. Based on our work, over 30 unknown package clones and vulnerabilities have been identified and patched.