
Research Article
Best-Effort Adversarial Approximation of Black-Box Malware Classifiers
@INPROCEEDINGS{10.1007/978-3-030-63086-7_18, author={Abdullah Ali and Birhanu Eshete}, title={Best-Effort Adversarial Approximation of Black-Box Malware Classifiers}, proceedings={Security and Privacy in Communication Networks. 16th EAI International Conference, SecureComm 2020, Washington, DC, USA, October 21-23, 2020, Proceedings, Part I}, proceedings_a={SECURECOMM}, year={2020}, month={12}, keywords={Model extraction Model stealing Adversarial machine learning}, doi={10.1007/978-3-030-63086-7_18} }
- Abdullah Ali
Birhanu Eshete
Year: 2020
Best-Effort Adversarial Approximation of Black-Box Malware Classifiers
SECURECOMM
Springer
DOI: 10.1007/978-3-030-63086-7_18
Abstract
An adversary who aims to steal a black-box model repeatedly queries it via a prediction API to learn its decision boundary. Adversarial approximation is non-trivial because of the enormous alternatives of model architectures, parameters, and features to explore. In this context, the adversary resorts to abest-effort strategythat yields the closest approximation. This paper explores best-effort adversarial approximation of a black-box malware classifier in themost challenging setting, where the adversary’s knowledge is limited to label only for a given input. Beginning with a limited input set, we leveragefeature representation mappingandcross-domain transferabilityto locally approximate a black-box malware classifier. We do so withdifferent feature typesfor the target and the substitute model while also usingnon-overlapping datafor training the target, training the substitute, and the comparison of the two. Against a Convolutional Neural Network (CNN) trained on raw byte sequences of Windows Portable Executables (PEs), our approach achieves a 92% accurate substitute (trained on pixel representations of PEs), and nearly 90% prediction agreement between the target and the substitute model. Against a 97.8% accurate gradient boosted decision tree trained on static PE features, our 91% accurate substitute agrees with the black-box on 90% of predictions, suggesting the strength of our purely black-box approximation.