About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
airo 25(1):

Research Article

malware detection method based on LLM to mine semantics of API

Download23 downloads
Cite
BibTeX Plain Text
  • @ARTICLE{10.4108/airo.8880,
        author={Ronghao Hou and Xiaoping Tian and Guanggang Geng},
        title={malware detection method based on LLM to mine semantics of API},
        journal={EAI Endorsed Transactions on AI and Robotics},
        volume={4},
        number={1},
        publisher={EAI},
        journal_a={AIRO},
        year={2025},
        month={5},
        keywords={malware detection, deep learning, feature engineer, API sequence},
        doi={10.4108/airo.8880}
    }
    
  • Ronghao Hou
    Xiaoping Tian
    Guanggang Geng
    Year: 2025
    malware detection method based on LLM to mine semantics of API
    AIRO
    EAI
    DOI: 10.4108/airo.8880
Ronghao Hou1,*, Xiaoping Tian2, Guanggang Geng1
  • 1: Jinan University
  • 2: Beijing Normal University
*Contact email: hourh@stu2022.jnu.edu.cn

Abstract

In recent years, the application of the LLM model has played an increasing role in more and more fields, including network security. Some attackers use LLM to attack, generate malicious code for attack, generate phishing emails, and analyze the vulnerability of the software. This also inspires us to utilize LLM to maintain net security. In the past research on malware detection, there were many feature engineering aspects that we had to ask experts to analyze, and this work is very difficult and resource-consuming due to the frequent updates of malware. In this paper, we propose a malware detection method for intrinsic semantics. The method first designs an API intrinsic semantic feature encoder, which extracts intrinsic semantic features from API names and Microsoft's official API definitions based on the LLM's prompt engineering and sentence embedding techniques. Then the API co-occurrence feature encoder is designed, which mines the contextual co-occurrence features of API from API call sequences based on the word2vec. The API semantic features and API co-occurrence features are combined to improve the malware detection performance. Also, it uses TCN-GRU to capture dependencies between API calls. Results on several public datasets show that our method achieves better performance than other methods, and in addition, ablation study results demonstrate the important role of intrinsic semantics in malware detection algorithms.

Keywords
malware detection, deep learning, feature engineer, API sequence
Received
2025-03-10
Accepted
2025-04-17
Published
2025-05-07
Publisher
EAI
http://dx.doi.org/10.4108/airo.8880

Copyright © 2025 R. Hou et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.

EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL