About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Security and Privacy in Communication Networks. 18th EAI International Conference, SecureComm 2022, Virtual Event, October 2022, Proceedings

Research Article

Cost-Effective Malware Classification Based on Deep Active Learning

Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-031-25538-0_12,
        author={Qian Qiang and Yige Chen and Yang Hu and Tianning Zang and Mian Cheng and Quanbo Pan and Yu Ding and Zisen Qi},
        title={Cost-Effective Malware Classification Based on Deep Active Learning},
        proceedings={Security and Privacy in Communication Networks. 18th EAI International Conference, SecureComm 2022, Virtual Event, October 2022, Proceedings},
        proceedings_a={SECURECOMM},
        year={2023},
        month={2},
        keywords={Deep active learning Malware classification Cost-effective},
        doi={10.1007/978-3-031-25538-0_12}
    }
    
  • Qian Qiang
    Yige Chen
    Yang Hu
    Tianning Zang
    Mian Cheng
    Quanbo Pan
    Yu Ding
    Zisen Qi
    Year: 2023
    Cost-Effective Malware Classification Based on Deep Active Learning
    SECURECOMM
    Springer
    DOI: 10.1007/978-3-031-25538-0_12
Qian Qiang1,*, Yige Chen1, Yang Hu2, Tianning Zang1, Mian Cheng, Quanbo Pan1, Yu Ding1, Zisen Qi1
  • 1: Institute of Information Engineering
  • 2: Haier (Beijing) IC Design Co.
*Contact email: qiangqian@iie.ac.cn

Abstract

Malware has now grown up to be one of the most important threats to internet security. As the number of malware families has increased rapidly, a malware classification model needs to classify the samples for further analysis. Recent success in deep learning-based malware classification, however heavily relies on the large number of labeled training samples, which may require considerable human effort. In this paper, we propose a novel malware classification framework for the cost issue, which is capable of building a competitive classifier via a limited amount of labeled training instances in an incremental learning manner. A cost-effective sample selection strategy is leveraged to focus expert efforts on labeling samples that are most informative for the classifier. We first convert the malware byte sequences into fixed-size gray-scale images through data visualization. Afterward, based on the strategy designed and oriented towards informative malware acquisition, we select samples through Convolutional Neural Network (ConvNet) to query experts for annotation according to the estimated gradients towards the last linear layer. The updated labeled dataset is then fed into the network for further fine-tuning progressively. To evaluate the capability of our method for acquiring informative malware from a pool of unknown samples, we conduct a series of experiments on a benchmark dataset named BIG 2015. Compared to random selection and other existing high-performance strategies, the proposed system can achieve a promising performance rise cost-effectively with less labeling effort wasted. The effectiveness of sample selection towards different families is also analyzed and further proves the efficiency of labeling cost. Moreover, the initialization methods and the pre-defined number of samples queried are studied for practical implementation.

Keywords
Deep active learning Malware classification Cost-effective
Published
2023-02-04
Appears in
SpringerLink
http://dx.doi.org/10.1007/978-3-031-25538-0_12
Copyright © 2022–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL