
Research Article
Learning AI Coding Style for Software Plagiarism Detection
@INPROCEEDINGS{10.1007/978-3-031-64954-7_24, author={Sri Haritha Ambati and Natalia Stakhanova and Enrico Branca}, title={Learning AI Coding Style for Software Plagiarism Detection}, proceedings={Security and Privacy in Communication Networks. 19th EAI International Conference, SecureComm 2023, Hong Kong, China, October 19-21, 2023, Proceedings, Part II}, proceedings_a={SECURECOMM PART 2}, year={2024}, month={10}, keywords={plagiarism detection code attribution AI-generated code}, doi={10.1007/978-3-031-64954-7_24} }
- Sri Haritha Ambati
Natalia Stakhanova
Enrico Branca
Year: 2024
Learning AI Coding Style for Software Plagiarism Detection
SECURECOMM PART 2
Springer
DOI: 10.1007/978-3-031-64954-7_24
Abstract
Software plagiarism is the reuse of software code without proper attribution and in violation of software licensing agreements or copyright laws. With the popularity of open-source software and the rapid emergence of AI Large Language Models such as ChatGPT and Google Bard, the concerns of plagiarized AI-generated code have been rising. Code attribution has been used to aid in the detection of software plagiarism cases. In this paper, we investigate the authorship of AI-generated code. We analyze the feasibility of code attribution approaches to verify authorship of source code generated by AI-based tools and investigate scenarios when plagiarized AI code can be identified. We perform an attribution analysis of an AI-generated source code on a large sample of programs written by software developers and generated by ChatGPT and Google Bard tools. We believe our work offers valuable insights for both academia and the software development community while contributing to the research in the authorship style of the fast-growing AI conversational models, ChatGPT and Bard.