
Research Article
Automated Software Vulnerability Detection via Pre-trained Context Encoder and Self Attention
@INPROCEEDINGS{10.1007/978-3-031-06365-7_15, author={Na Li and Haoyu Zhang and Zhihui Hu and Guang Kou and Huadong Dai}, title={Automated Software Vulnerability Detection via Pre-trained Context Encoder and Self Attention}, proceedings={Digital Forensics and Cyber Crime. 12th EAI International Conference, ICDF2C 2021, Virtual Event, Singapore, December 6-9, 2021, Proceedings}, proceedings_a={ICDF2C}, year={2022}, month={6}, keywords={Automated vulnerability detection Self attention Pre-trained language model Transfer learning}, doi={10.1007/978-3-031-06365-7_15} }
- Na Li
Haoyu Zhang
Zhihui Hu
Guang Kou
Huadong Dai
Year: 2022
Automated Software Vulnerability Detection via Pre-trained Context Encoder and Self Attention
ICDF2C
Springer
DOI: 10.1007/978-3-031-06365-7_15
Abstract
With the increasing size and complexity of modern software projects, it is almost impossible to discover all software vulnerabilities in time by manual analysis. Most existing vulnerability detection methods rely on manual designed vulnerability features, which is costly and leads to high false positive rates. Pre-trained models for programming language have been used to gain dramatic improvements to code-related tasks, which considers syntactic-level structure of code further. Thus, we propose an automated vulnerability detection method based on pre-trained context encoder as well as self-attention mechanism. Instead of current static analysis approaches, we treat the program source code as natural language and introduce the pre-trained contextualized language model to capture the program local dependencies and learn a better contextualized representation. The extracted source code feature vectors are then fed into a designed Self Attention Networks (SAN) module. We develop the SAN module based on Long-Short Term Memory (LSTM) model and self attention, which learns the long-range dependencies of program vulnerable points more efficiently. We conduct experiments on two source code level C program benchmark datasets, where four different evaluation metrics are applied for comparing the vulnerability detection performances of different systems. Extensive experimental results demonstrate that our proposed model outperforms previous state-of-the-art automated vulnerability detection method by around 7.2% in F1-measure and 2.6% in precision.