Binary Code Similarity Detection through LSTM and Siamese Neural Network

Zhengping Luo; Tao Hou; Xiangrong Zhou; Hui Zeng; Zhuo Lu

sesa 21(29): e1

Research Article

Binary Code Similarity Detection through LSTM and Siamese Neural Network

Download1284 downloads

Cite: BibTeX Plain Text

@ARTICLE{10.4108/eai.14-9-2021.170956,
    author={Zhengping Luo and Tao Hou and Xiangrong Zhou and Hui Zeng and Zhuo Lu},
    title={Binary Code Similarity Detection through LSTM and Siamese Neural Network},
    journal={EAI Endorsed Transactions on Security and Safety},
    volume={8},
    number={29},
    publisher={EAI},
    journal_a={SESA},
    year={2021},
    month={9},
    keywords={Malware detection, binary analysis, LSTM, Siamese Neural Network, similarity detection},
    doi={10.4108/eai.14-9-2021.170956}
}

Zhengping Luo
Tao Hou
Xiangrong Zhou
Hui Zeng
Zhuo Lu
Year: 2021
Binary Code Similarity Detection through LSTM and Siamese Neural Network
SESA
EAI
DOI: 10.4108/eai.14-9-2021.170956

Zhengping Luo¹^,*, Tao Hou², Xiangrong Zhou³, Hui Zeng³, Zhuo Lu²

1: Department of Computer Science & Physics, Rider University, Lawrenceville, NJ 08648, USA
2: Computer Science Engineering and Electrical Engineering, University of South Florida, Tampa FL 33620, USA
3: Intelligent Automation Inc., Rockville MD 20855, USA

*Contact email: zhengpingluo@mail.usf.edu

Abstract

Given the fact that many software projects are closed-source, analyzing security-related vulnerabilities at the binary level is quintessential to protect computer systems from attacks of malware. Binary code similarity detection is a potential solution for detecting malware from the binaries generated by the processor. In this paper, we proposed a malware detection mechanism based on the binaries using machine learning techniques. Through utilizing the Recurrent Neural Network (RNN), more specifically Long Short-Term Memory (LSTM) network, we generate the uniformed feature embedding of each binary file and further take advantage of the Siamese Neural Network to compute the similarity measure of the extracted features. Therefore, the security risks of the software projects can be evaluated through the similarity measure of the corresponding binaries with existing trained malware. Our real-world experimental results demonstrate a convincing performance in distinguishing out the outliers, and achieved slightly better performance compared with existing state-of-the-art methods.

Keywords: Malware detection, binary analysis, LSTM, Siamese Neural Network, similarity detection

Received: 2021-05-27
Accepted: 2021-09-10
Published: 2021-09-14
Publisher: EAI

: http://dx.doi.org/10.4108/eai.14-9-2021.170956

Copyright © 2021 Zhengping Luo et al., licensed to EAI. This is an open access article distributed under the terms of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/), which permits unlimited use, distribution and reproduction in any medium so long as the original work is properly cited.

Binary Code Similarity Detection through LSTM and Siamese Neural Network

Abstract

About EAI

Community

Publish with EAI