About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
sis 19(22): e3

Research Article

Scalable Source Code Similarity Detection in Large Code Repositories

Download1666 downloads
Cite
BibTeX Plain Text
  • @ARTICLE{10.4108/eai.13-7-2018.159353,
        author={Firas Alomari and Muhammed Harbi},
        title={Scalable Source Code Similarity Detection in Large Code Repositories},
        journal={EAI Endorsed Transactions on Scalable Information Systems},
        volume={6},
        number={22},
        publisher={EAI},
        journal_a={SIS},
        year={2019},
        month={7},
        keywords={clones, software similarity, Control Flow Graphs, Fingerprints},
        doi={10.4108/eai.13-7-2018.159353}
    }
    
  • Firas Alomari
    Muhammed Harbi
    Year: 2019
    Scalable Source Code Similarity Detection in Large Code Repositories
    SIS
    EAI
    DOI: 10.4108/eai.13-7-2018.159353
Firas Alomari1,*, Muhammed Harbi1
  • 1: Corporate Applications Department, Saudi Aramco, Dhahran, Saudi Arabia
*Contact email: firas.alomari@aramco.com

Abstract

Source code similarity are increasingly used in application development to identify clones, isolate bugs, and find copy-rights violations. Similar code fragments can be very problematic due to the fact that errors in the original code must be fixed in every copy. Other maintenance changes, such as extensions or patches, must be applied multiple times. Furthermore, the diversity of coding styles and flexibility of modern languages makes it difficult and cost ineffective to manually inspect large code repositories. Therefore, detection is only feasible by automatic techniques. We present an efficient and scalable approach for similar code fragment identification based on source code control flow graphs fingerprinting. The source code is processed to generate control flow graphs that are then hashed to create a unique fingerprint of the code capturing semantics as well as syntax similarity. The fingerprints can then be efficiently stored and retrieved to perform similarity search between code fragments. Experimental results from our prototype implementation supports the validity of our approach and show its effectiveness and efficiency in comparison with other solutions.

Keywords
clones, software similarity, Control Flow Graphs, Fingerprints
Received
2019-04-05
Accepted
2019-05-20
Published
2019-07-04
Publisher
EAI
http://dx.doi.org/10.4108/eai.13-7-2018.159353

Copyright © 2019 Firas Alomari et al., licensed to EAI. This is an open access article distributed under the terms of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/), which permits unlimited use, distribution and reproduction in any medium so long as the original work is properly cited.

EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL