About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
10th EAI International Conference on Bio-inspired Information and Communications Technologies (formerly BIONETICS)

Research Article

This Malware Looks Familiar: Laymen Identify Malware Run-time Similarity with Chernoff faces and Stick Figures

Download1067 downloads
Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.4108/eai.22-3-2017.152417,
        author={Nathan VanHoudnos and William Casey and David French and Brian Lindauer and Eliezer Kanal and Evan Wright and Bronwyn Woods and Seungwhan Moon and Peter Jansen and Jamie Carbonell},
        title={This Malware Looks Familiar: Laymen Identify Malware Run-time Similarity with Chernoff faces and Stick Figures},
        proceedings={10th EAI International Conference on Bio-inspired Information and Communications Technologies (formerly BIONETICS)},
        publisher={EAI},
        proceedings_a={BICT},
        year={2017},
        month={3},
        keywords={malware classification chernoff faces active learning machine learning},
        doi={10.4108/eai.22-3-2017.152417}
    }
    
  • Nathan VanHoudnos
    William Casey
    David French
    Brian Lindauer
    Eliezer Kanal
    Evan Wright
    Bronwyn Woods
    Seungwhan Moon
    Peter Jansen
    Jamie Carbonell
    Year: 2017
    This Malware Looks Familiar: Laymen Identify Malware Run-time Similarity with Chernoff faces and Stick Figures
    BICT
    EAI
    DOI: 10.4108/eai.22-3-2017.152417
Nathan VanHoudnos1, William Casey1, David French1, Brian Lindauer1, Eliezer Kanal,*, Evan Wright2, Bronwyn Woods3, Seungwhan Moon4, Peter Jansen5, Jamie Carbonell4
  • 1: Software Engineering Institute, Carnegie Mellon University
  • 2: Anomali Inc
  • 3: Turnitin
  • 4: Language Technologies Institute, Carnegie Mellon University
  • 5: .Language Technologies Institute, Carnegie Mellon University
*Contact email: ekanal@cert.org

Abstract

Classifying unknown malicious binaries into malware families provides valuable information to security professionals. The reverse engineering necessary to classify a given binary into a known family, however, is expensive because the time of the human expert is expensive. In this work, we give a proof-of-concept approach to visualizing malware so that non-experts are able to distinguish between three heterogenous families of malware with minimal training. We present this work as a first step towards a human in the loop active learning system for malware analysis. To do so we curated a dataset of malware variants and labeled them using expert malware reverse engineering, instrumented runtime behavior of these malware variants, constructed a simple, graph based feature set from the runtime behavior, and visualized low-dimensional representations of these system call graphs with stick figures and Chernoff faces. We then selected the three families with the largest within family variation and asked non-experts on Amazon Mechanical Turk to classify binaries between these three families using the generated visual representations. We found that non-experts completed the task with between 63% and 86% accuracy, and when aggregated, these non-expert labels successfully trained a classifier to a similar level of performance as the ground truth labels. Moreover, the information from the experiments yielded new insights into the variation within one of the malware families.

Keywords
malware classification chernoff faces active learning machine learning
Published
2017-03-22
Publisher
EAI
http://dx.doi.org/10.4108/eai.22-3-2017.152417
Copyright © 2017–2025 EAI
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL