
Research Article
malware detection method based on LLM to mine semantics of API
@ARTICLE{10.4108/airo.8880, author={Ronghao Hou and Xiaoping Tian and Guanggang Geng}, title={malware detection method based on LLM to mine semantics of API}, journal={EAI Endorsed Transactions on AI and Robotics}, volume={4}, number={1}, publisher={EAI}, journal_a={AIRO}, year={2025}, month={5}, keywords={malware detection, deep learning, feature engineer, API sequence}, doi={10.4108/airo.8880} }
- Ronghao Hou
Xiaoping Tian
Guanggang Geng
Year: 2025
malware detection method based on LLM to mine semantics of API
AIRO
EAI
DOI: 10.4108/airo.8880
Abstract
In recent years, the application of the LLM model has played an increasing role in more and more fields, including network security. Some attackers use LLM to attack, generate malicious code for attack, generate phishing emails, and analyze the vulnerability of the software. This also inspires us to utilize LLM to maintain net security. In the past research on malware detection, there were many feature engineering aspects that we had to ask experts to analyze, and this work is very difficult and resource-consuming due to the frequent updates of malware. In this paper, we propose a malware detection method for intrinsic semantics. The method first designs an API intrinsic semantic feature encoder, which extracts intrinsic semantic features from API names and Microsoft's official API definitions based on the LLM's prompt engineering and sentence embedding techniques. Then the API co-occurrence feature encoder is designed, which mines the contextual co-occurrence features of API from API call sequences based on the word2vec. The API semantic features and API co-occurrence features are combined to improve the malware detection performance. Also, it uses TCN-GRU to capture dependencies between API calls. Results on several public datasets show that our method achieves better performance than other methods, and in addition, ablation study results demonstrate the important role of intrinsic semantics in malware detection algorithms.
Copyright © 2025 R. Hou et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.