API Misuse Detection Based on Stacked LSTM

Shuyin OuYang; Fan Ge; Li Kuang; Yuyu Yin

Collaborative Computing: Networking, Applications and Worksharing. 16th EAI International Conference, CollaborateCom 2020, Shanghai, China, October 16–18, 2020, Proceedings, Part I

Research Article

API Misuse Detection Based on Stacked LSTM

Download

80 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.1007/978-3-030-67537-0_26,
    author={Shuyin OuYang and Fan Ge and Li Kuang and Yuyu Yin},
    title={API Misuse Detection Based on Stacked LSTM},
    proceedings={Collaborative Computing: Networking, Applications and Worksharing. 16th EAI International Conference, CollaborateCom 2020, Shanghai, China, October 16--18, 2020, Proceedings, Part I},
    proceedings_a={COLLABORATECOM},
    year={2021},
    month={1},
    keywords={API misuse detection Static analysis Pre-training model Semantic representation LSTM},
    doi={10.1007/978-3-030-67537-0_26}
}

Shuyin OuYang
Fan Ge
Li Kuang
Yuyu Yin
Year: 2021
API Misuse Detection Based on Stacked LSTM
COLLABORATECOM
Springer
DOI: 10.1007/978-3-030-67537-0_26

Shuyin OuYang¹, Fan Ge¹, Li Kuang¹^,*, Yuyu Yin²

1: School of Computer Science and Engineering, Central South University, Changsha
2: Hangzhou Dianzi University, Hangzhou

*Contact email: kuangli@csu.edu.cn

Abstract

In modern software engineering, API (Application Programming Interface) is widely used to develop applications rapidly by reusing data structure, frameworks, class libs, and etc. However, due to the considerable number of interfaces, lack of documents and timely maintenance and updates, APIs are often used in a wrong way. Therefore, it has become an important problem to detect API misuse in an automatic way. Many existing automatic API detecting methods do not make full use of APIs’ potential semantic information and independent integrity of each API. In this paper, we employ Stacked LSTM to learn the API usage specification to detect the API misuse defects. Specifically, first, we obtain ACSG (API Call Syntax Graph) through the static analysis of source code. And then, based on ACSG, we generate API sequences, and transform the sequences into <precious API sequence, next API> for training. Third, in order to represent the APIs in a semantic way, we apply word2vec as a pre-training model to embed features of each API. Though the stacked LSTM model, we regard embedding precious API sequence as the input to model the API use specifications and discover the potential API misuse defects by judging whether the next API is in the output (API probability list) or not. We design experiments to evaluate the effectiveness our method with Java Cryptography APIs and their used code, and the results show the advancement of our proposed method.

Keywords: API misuse detection, Static analysis, Pre-training model, Semantic representation, LSTM

Published: 2021-01-22
Appears in: SpringerLink

: http://dx.doi.org/10.1007/978-3-030-67537-0_26

API Misuse Detection Based on Stacked LSTM

Abstract

About EAI

Community

Publish with EAI