
Research Article
Identification of Significant Permissions for Efficient Android Malware Detection
@INPROCEEDINGS{10.1007/978-3-030-68737-3_3, author={Hemant Rathore and Sanjay K. Sahay and Ritvik Rajvanshi and Mohit Sewak}, title={Identification of Significant Permissions for Efficient Android Malware Detection}, proceedings={Broadband Communications, Networks, and Systems. 11th EAI International Conference, BROADNETS 2020, Qingdao, China, December 11--12, 2020, Proceedings}, proceedings_a={BROADNETS}, year={2021}, month={2}, keywords={Android malware Deep neural network Machine learning Malware detection Static analysis}, doi={10.1007/978-3-030-68737-3_3} }
- Hemant Rathore
Sanjay K. Sahay
Ritvik Rajvanshi
Mohit Sewak
Year: 2021
Identification of Significant Permissions for Efficient Android Malware Detection
BROADNETS
Springer
DOI: 10.1007/978-3-030-68737-3_3
Abstract
Since Google unveiled Android OS for smartphones, malware are thriving with 3Vs, i.e. volume, velocity and variety. A recent report indicates that one out of every five business/industry mobile application leaks sensitive personal data. Traditional signature/heuristic based malware detection systems are unable to cope up with current malware challenges and thus threaten the Android ecosystem. Therefore recently researchers have started exploring machine learning and deep learning based malware detection systems. In this paper, we performed a comprehensive feature analysis to identify the significant Android permissions and propose an efficient Android malware detection system using machine learning and deep neural network. We constructed a set of 16 permissions ((8\%)of the total set) derived from variance threshold, auto-encoders, and principal component analysis to build a malware detection engine which consumes less train and test time without significant compromise on the model accuracy. Our experimental results show that the Android malware detection model based on the random forest classifier is most balanced and achieves the highest area under curve score of(97.7\%), which is better than the current state-of-art systems. We also observed that deep neural networks attain comparable accuracy to the baseline results but with a massive computational penalty.