
Research Article
Optir-SBERT: Cross-Architecture Binary Code Similarity Detection Based on Optimized LLVM IR
@INPROCEEDINGS{10.1007/978-3-031-56583-0_7, author={Yintong Yan and Lu Yu and Taiyan Wang and Yuwei Li and Zulie Pan}, title={Optir-SBERT: Cross-Architecture Binary Code Similarity Detection Based on Optimized LLVM IR}, proceedings={Digital Forensics and Cyber Crime. 14th EAI International Conference, ICDF2C 2023, New York City, NY, USA, November 30, 2023, Proceedings, Part II}, proceedings_a={ICDF2C PART 2}, year={2024}, month={4}, keywords={Binary code similarity detection Cross-architecture Optimized LLVM IR SBERT File-level vulnerability identification mechanism}, doi={10.1007/978-3-031-56583-0_7} }
- Yintong Yan
Lu Yu
Taiyan Wang
Yuwei Li
Zulie Pan
Year: 2024
Optir-SBERT: Cross-Architecture Binary Code Similarity Detection Based on Optimized LLVM IR
ICDF2C PART 2
Springer
DOI: 10.1007/978-3-031-56583-0_7
Abstract
Cross-architecture binary code similarity detection plays an important role in different security domains. In view of the low accuracy and poor scalability of existing cross-architecture detection technologies, we propose Optir-SBERT, which is the first technology to detect cross-architecture binary code similarity based on optimized LLVM IR. At the same time, we design a new data set BinaryIR, which is more diverse and provides a benchmark data set for subsequent research work based on LLVM IR. In terms of cross-architecture binary code similarity detection, the accuracy of Optir-SBERT reaches 94.38%, and the contribution of optimization is 3.99%. In terms of vulnerability detection, the average accuracy of Optir-SBERT reach 93.9%, and the contribution of optimization is 7%. The results are better than existing state-of-the-art (SOTA) cross-architecture detection technologies. In order to improve the efficiency of vulnerability detection in realistic scenarios, we introduced a file-level vulnerability identification mechanism on the basis of Optir-SBERT. The new model Optir-SBERT-F saved 45.36% of the detection time on the premise of a slight decrease in detection F value, which greatly improves the efficiency of vulnerability detection.