
Research Article
Identifying Library Functions in Stripped Binary: Combining Function Similarity and Call Graph Features
@INPROCEEDINGS{10.1007/978-3-031-64954-7_20, author={ZhanPeng Liu and Xinhui Han}, title={Identifying Library Functions in Stripped Binary: Combining Function Similarity and Call Graph Features}, proceedings={Security and Privacy in Communication Networks. 19th EAI International Conference, SecureComm 2023, Hong Kong, China, October 19-21, 2023, Proceedings, Part II}, proceedings_a={SECURECOMM PART 2}, year={2024}, month={10}, keywords={Binary Function Similarity Reverse Engineering Rust Programming Language}, doi={10.1007/978-3-031-64954-7_20} }
- ZhanPeng Liu
Xinhui Han
Year: 2024
Identifying Library Functions in Stripped Binary: Combining Function Similarity and Call Graph Features
SECURECOMM PART 2
Springer
DOI: 10.1007/978-3-031-64954-7_20
Abstract
Reverse engineering binary programs without debug information, such as malwares and embedded firmwares, is often a challenging and time-consuming process that relies heavily on manual analysis. Automating the process of identifying frequently used library functions can significantly improve the efficiency. While machine learning techniques have shown satisfactory results in computing binary function similarity in specific experimental contexts, their performance in open-set retrieval task remains largely unexplored. Notably, identifying known functions in stripped binaries falls under this category. To contribute to this area of research, we introduce a brand-new dataset derived from popular Rust projects. This dataset not only aims to stimulate further research on Rust program analysis but also serves as a robust platform for evaluating the performance of state-of-the-art methods in open-set function retrieval tasks. Through our analysis, we discover that similarity-only methods have limited effectiveness in rejecting negative samples. In response to this identified shortcoming, we present a novel approach that integrates features derived from function call graphs, enabling us to determine a function’s identity by considering both its similarity and call relationships with other functions. Experimental results demonstrate that our method enhances overall performance compared to similarity-only solutions, especially under more challenging conditions.