Research Article
Synthetic Malware Using Deep Variational Autoencoders and Generative Adversarial Networks
@ARTICLE{10.4108/eetiot.6566, author={Aaron Choi and Albert Giang and Sajit Jumani and David Luong and Fabio Di Troia}, title={Synthetic Malware Using Deep Variational Autoencoders and Generative Adversarial Networks}, journal={EAI Endorsed Transactions on Internet of Things}, volume={10}, number={1}, publisher={EAI}, journal_a={IOT}, year={2024}, month={7}, keywords={Malware, Synthetic Malware, GAN, VAE}, doi={10.4108/eetiot.6566} }
- Aaron Choi
Albert Giang
Sajit Jumani
David Luong
Fabio Di Troia
Year: 2024
Synthetic Malware Using Deep Variational Autoencoders and Generative Adversarial Networks
IOT
EAI
DOI: 10.4108/eetiot.6566
Abstract
The effectiveness of detecting malicious files heavily relies on the quality of the training dataset, particularly its size and authenticity. However, the lack of high-quality training data remains one of the biggest challenges in achieving widespread adoption of malware detection by trained machine and deep learning models. In response to this challenge, researchers have made initial strides by employing generative techniques to create synthetic malware samples. This work utilizes deep variational autoencoders (VAE) and generative adversarial networks (GAN) to produce malware samples as opcode sequences. The generated malware opcodes are then distinguished from authentic opcode samples using machine and deep learning techniques as validation methods. The primary objective of this study was to compare synthetic malware generated using VAE and GAN technologies. The results showed that neither approach could create synthetic malware that could deceive machine learning classification. However, the WGAN-GP algorithm showed more promise by requiring a higher number of synthetic malware samples in the train set to effectively be detected, proving it a better approach in synthetic malware generation.
Copyright © 2024 A. Choi et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.