malware detection method based on LLM to mine semantics of API

Ronghao Hou; Xiaoping Tian; Guanggang Geng

airo 25(1):

Research Article

malware detection method based on LLM to mine semantics of API

Download390 downloads

Cite: BibTeX Plain Text

@ARTICLE{10.4108/airo.8880,
    author={Ronghao Hou and Xiaoping Tian and Guanggang Geng},
    title={malware detection method based on LLM to mine semantics of API},
    journal={EAI Endorsed Transactions on AI and Robotics},
    volume={4},
    number={1},
    publisher={EAI},
    journal_a={AIRO},
    year={2025},
    month={5},
    keywords={malware detection, deep learning, feature engineer, API sequence},
    doi={10.4108/airo.8880}
}

Ronghao Hou
Xiaoping Tian
Guanggang Geng
Year: 2025
malware detection method based on LLM to mine semantics of API
AIRO
EAI
DOI: 10.4108/airo.8880

Ronghao Hou¹^,*, Xiaoping Tian², Guanggang Geng¹

1: Jinan University
2: Beijing Normal University

*Contact email: hourh@stu2022.jnu.edu.cn

Abstract

In recent years, the application of the LLM model has played an increasing role in more and more fields, including network security. Some attackers use LLM to attack, generate malicious code for attack, generate phishing emails, and analyze the vulnerability of the software. This also inspires us to utilize LLM to maintain net security. In the past research on malware detection, there were many feature engineering aspects that we had to ask experts to analyze, and this work is very difficult and resource-consuming due to the frequent updates of malware. In this paper, we propose a malware detection method for intrinsic semantics. The method first designs an API intrinsic semantic feature encoder, which extracts intrinsic semantic features from API names and Microsoft's official API definitions based on the LLM's prompt engineering and sentence embedding techniques. Then the API co-occurrence feature encoder is designed, which mines the contextual co-occurrence features of API from API call sequences based on the word2vec. The API semantic features and API co-occurrence features are combined to improve the malware detection performance. Also, it uses TCN-GRU to capture dependencies between API calls. Results on several public datasets show that our method achieves better performance than other methods, and in addition, ablation study results demonstrate the important role of intrinsic semantics in malware detection algorithms.

Keywords: malware detection, deep learning, feature engineer, API sequence

Received: 2025-03-10
Accepted: 2025-04-17
Published: 2025-05-07
Publisher: EAI

: http://dx.doi.org/10.4108/airo.8880

Copyright © 2025 R. Hou et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.

malware detection method based on LLM to mine semantics of API

Abstract

About EAI

Community

Publish with EAI