Proceedings of The 6th Asia-Pacific Education And Science Conference, AECon 2020, 19-20 December 2020, Purwokerto, Indonesia

Research Article

Imbalanced Data Analysis of Adolescent Risk Behavior of Drug Abuse using Random Forest

Download30 downloads
  • @INPROCEEDINGS{10.4108/eai.19-12-2020.2309145,
        author={Ismaini  Zain and Kartika  Fithiasari and Erma Oktania Permatasari and Tyas Ajeng Nastiti and Mardyono  Mardyono and Nilam Novita Sari and Resty  Pujihasvuty and Sri Lilestina Nasution},
        title={Imbalanced Data Analysis of Adolescent Risk Behavior of Drug Abuse using Random Forest},
        proceedings={Proceedings of The 6th Asia-Pacific Education And Science Conference, AECon 2020, 19-20 December 2020, Purwokerto, Indonesia},
        publisher={EAI},
        proceedings_a={AECON},
        year={2021},
        month={8},
        keywords={adolescent risk behavior drug abuse imbalanced data random forest smote-n},
        doi={10.4108/eai.19-12-2020.2309145}
    }
    
  • Ismaini Zain
    Kartika Fithiasari
    Erma Oktania Permatasari
    Tyas Ajeng Nastiti
    Mardyono Mardyono
    Nilam Novita Sari
    Resty Pujihasvuty
    Sri Lilestina Nasution
    Year: 2021
    Imbalanced Data Analysis of Adolescent Risk Behavior of Drug Abuse using Random Forest
    AECON
    EAI
    DOI: 10.4108/eai.19-12-2020.2309145
Ismaini Zain1,*, Kartika Fithiasari1, Erma Oktania Permatasari1, Tyas Ajeng Nastiti2, Mardyono Mardyono3, Nilam Novita Sari1, Resty Pujihasvuty1, Sri Lilestina Nasution1
  • 1: Department of Statistics, Faculty of Science and Data Analystics, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia
  • 2: Department of Visual Communication Design, Universitas International Semen Indonesia, Gresik, Indonesia
  • 3: National Population and Family Planning Board, East Java, Indonesia
*Contact email: ismaini_z@statistika.its.ac.id

Abstract

Adolescence represents a period of self-searching and vulnerability to fall into risky behavior such as drug abuse. In Indonesia, the case of drug abuse by adolescents is high. Therefore, to know the factors behind it can be done using classification such as random forest. The data used in this research were adolescent risk behavior of drug abuse based on SKAP. The percentage of drug abuse among adolescents are 4.1% shows that there is an imbalanced class in the data. It is necessary to handle the imbalanced data by applying the SMOTE-N. This study will classify the adolescent risk behavior of drug abuse using random forest combine with SMOTE-N to handle the imbalanced class. The results show that the model using SMOTE-N is better because it can increase specificity and g-means. The variables affect the classification of drug abuse among adolescents are the age, sex, and psychology consequence