Digital Forensics and Cyber Crime. 10th International EAI Conference, ICDF2C 2018, New Orleans, LA, USA, September 10–12, 2018, Proceedings

Research Article

AndroParse - An Android Feature Extraction Framework and Dataset

Download
481 downloads
  • @INPROCEEDINGS{10.1007/978-3-030-05487-8_4,
        author={Robert Schmicker and Frank Breitinger and Ibrahim Baggili},
        title={AndroParse - An Android Feature Extraction Framework and Dataset},
        proceedings={Digital Forensics and Cyber Crime. 10th International EAI Conference, ICDF2C 2018, New Orleans, LA, USA, September 10--12, 2018, Proceedings},
        proceedings_a={ICDF2C},
        year={2019},
        month={1},
        keywords={AndroParse Android Malware Dataset Features Framework},
        doi={10.1007/978-3-030-05487-8_4}
    }
    
  • Robert Schmicker
    Frank Breitinger
    Ibrahim Baggili
    Year: 2019
    AndroParse - An Android Feature Extraction Framework and Dataset
    ICDF2C
    Springer
    DOI: 10.1007/978-3-030-05487-8_4
Robert Schmicker1,*, Frank Breitinger1,*, Ibrahim Baggili1,*
  • 1: University of New Haven
*Contact email: rschm2@unh.newhaven.edu, FBreitinger@newhaven.edu, IBaggili@newhaven.edu

Abstract

Android malware has become a major challenge. As a consequence, practitioners and researchers spend a significant time analyzing Android applications (APK). A common procedure (especially for data scientists) is to extract features such as permissions, APIs or strings which can then be analyzed. Current state of the art tools have three major issues: (1) a single tool cannot extract all the significant features used by scientists and practitioners (2) Current tools are not designed to be extensible and (3) Existing parsers can be timely as they are not runtime efficient or scalable. Therefore, this work presents which is an open-source Android parser written in Golang that currently extracts the four most common features: Permissions, APIs, Strings and Intents. AndroParse outputs JSON files as they can easily be used by most major programming languages. Constructing the parser allowed us to create an extensive feature dataset which can be accessed by our independent REST API. Our dataset currently has 67,703 benign and 46,683 malicious APK samples.