Synthetic Malware Using Deep Variational Autoencoders and Generative Adversarial Networks

Aaron Choi; Albert Giang; Sajit Jumani; David Luong; Fabio Di Troia

IoT 23(1):

Research Article

Synthetic Malware Using Deep Variational Autoencoders and Generative Adversarial Networks

Download222 downloads

Cite: BibTeX Plain Text

@ARTICLE{10.4108/eetiot.6566,
    author={Aaron Choi and Albert Giang and Sajit Jumani and David Luong and Fabio Di Troia},
    title={Synthetic Malware Using Deep Variational Autoencoders and Generative Adversarial Networks},
    journal={EAI Endorsed Transactions on Internet of Things},
    volume={10},
    number={1},
    publisher={EAI},
    journal_a={IOT},
    year={2024},
    month={7},
    keywords={Malware, Synthetic Malware, GAN, VAE},
    doi={10.4108/eetiot.6566}
}

Aaron Choi
Albert Giang
Sajit Jumani
David Luong
Fabio Di Troia
Year: 2024
Synthetic Malware Using Deep Variational Autoencoders and Generative Adversarial Networks
IOT
EAI
DOI: 10.4108/eetiot.6566

Aaron Choi¹, Albert Giang¹, Sajit Jumani¹, David Luong¹, Fabio Di Troia¹^,*

1: San Jose State University

*Contact email: fabio.ditroia@sjsu.edu

Abstract

The effectiveness of detecting malicious files heavily relies on the quality of the training dataset, particularly its size and authenticity. However, the lack of high-quality training data remains one of the biggest challenges in achieving widespread adoption of malware detection by trained machine and deep learning models. In response to this challenge, researchers have made initial strides by employing generative techniques to create synthetic malware samples. This work utilizes deep variational autoencoders (VAE) and generative adversarial networks (GAN) to produce malware samples as opcode sequences. The generated malware opcodes are then distinguished from authentic opcode samples using machine and deep learning techniques as validation methods. The primary objective of this study was to compare synthetic malware generated using VAE and GAN technologies. The results showed that neither approach could create synthetic malware that could deceive machine learning classification. However, the WGAN-GP algorithm showed more promise by requiring a higher number of synthetic malware samples in the train set to effectively be detected, proving it a better approach in synthetic malware generation.

Keywords: Malware, Synthetic Malware, GAN, VAE

Received: 2024-02-15
Accepted: 2024-06-30
Published: 2024-07-09
Publisher: EAI

: http://dx.doi.org/10.4108/eetiot.6566

Copyright © 2024 A. Choi et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.

Synthetic Malware Using Deep Variational Autoencoders and Generative Adversarial Networks

Abstract

About EAI

Community

Publish with EAI