Automated Software Vulnerability Detection via Pre-trained Context Encoder and Self Attention

Na Li; Haoyu Zhang; Zhihui Hu; Guang Kou; Huadong Dai

Digital Forensics and Cyber Crime. 12th EAI International Conference, ICDF2C 2021, Virtual Event, Singapore, December 6-9, 2021, Proceedings

Research Article

Automated Software Vulnerability Detection via Pre-trained Context Encoder and Self Attention

Download

17 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.1007/978-3-031-06365-7_15,
    author={Na Li and Haoyu Zhang and Zhihui Hu and Guang Kou and Huadong Dai},
    title={Automated Software Vulnerability Detection via Pre-trained Context Encoder and Self Attention},
    proceedings={Digital Forensics and Cyber Crime. 12th EAI International Conference, ICDF2C 2021, Virtual Event, Singapore, December 6-9, 2021, Proceedings},
    proceedings_a={ICDF2C},
    year={2022},
    month={6},
    keywords={Automated vulnerability detection Self attention Pre-trained language model Transfer learning},
    doi={10.1007/978-3-031-06365-7_15}
}

Na Li
Haoyu Zhang
Zhihui Hu
Guang Kou
Huadong Dai
Year: 2022
Automated Software Vulnerability Detection via Pre-trained Context Encoder and Self Attention
ICDF2C
Springer
DOI: 10.1007/978-3-031-06365-7_15

Na Li¹, Haoyu Zhang¹, Zhihui Hu¹, Guang Kou¹, Huadong Dai¹^,*

1: Artificial Intelligence Research Center

*Contact email: hddai@vip.163.com

Abstract

With the increasing size and complexity of modern software projects, it is almost impossible to discover all software vulnerabilities in time by manual analysis. Most existing vulnerability detection methods rely on manual designed vulnerability features, which is costly and leads to high false positive rates. Pre-trained models for programming language have been used to gain dramatic improvements to code-related tasks, which considers syntactic-level structure of code further. Thus, we propose an automated vulnerability detection method based on pre-trained context encoder as well as self-attention mechanism. Instead of current static analysis approaches, we treat the program source code as natural language and introduce the pre-trained contextualized language model to capture the program local dependencies and learn a better contextualized representation. The extracted source code feature vectors are then fed into a designed Self Attention Networks (SAN) module. We develop the SAN module based on Long-Short Term Memory (LSTM) model and self attention, which learns the long-range dependencies of program vulnerable points more efficiently. We conduct experiments on two source code level C program benchmark datasets, where four different evaluation metrics are applied for comparing the vulnerability detection performances of different systems. Extensive experimental results demonstrate that our proposed model outperforms previous state-of-the-art automated vulnerability detection method by around 7.2% in F1-measure and 2.6% in precision.

Keywords: Automated vulnerability detection, Self attention, Pre-trained language model, Transfer learning

Published: 2022-06-04
Appears in: SpringerLink

: http://dx.doi.org/10.1007/978-3-031-06365-7_15

Automated Software Vulnerability Detection via Pre-trained Context Encoder and Self Attention

Abstract

About EAI

Community

Publish with EAI