Proceedings of the 2nd Universitas Kuningan International Conference on System, Engineering, and Technology, UNISET 2021, 2 December 2021, Kuningan, West Java, Indonesia

Research Article

Using Data Sampling Technique for Improving Classification of Covid-19 and Lung Diseases

Download211 downloads
  • @INPROCEEDINGS{10.4108/eai.2-12-2021.2320242,
        author={A R Purnajaya and F D Hanggara},
        title={Using Data Sampling Technique for Improving Classification of Covid-19 and Lung Diseases},
        proceedings={Proceedings of the 2nd Universitas Kuningan International Conference on System, Engineering, and Technology, UNISET 2021, 2 December 2021, Kuningan, West Java, Indonesia},
        publisher={EAI},
        proceedings_a={UNISET},
        year={2022},
        month={8},
        keywords={lung diseases; smote; random oversampling},
        doi={10.4108/eai.2-12-2021.2320242}
    }
    
  • A R Purnajaya
    F D Hanggara
    Year: 2022
    Using Data Sampling Technique for Improving Classification of Covid-19 and Lung Diseases
    UNISET
    EAI
    DOI: 10.4108/eai.2-12-2021.2320242
A R Purnajaya1,*, F D Hanggara1
  • 1: Universal University, Indonesia
*Contact email: rezkipurnajaya@gmail.com

Abstract

The Covid-19 is a virus that has spread around the world and can cause infected respiratory tracts to die. One solution to this problem is to classify the Covid-19 chest X-ray. Among the challenges in this area is improving the classification performance of Covid-19 chest X-rays. Covid-19 chest X-ray and other lung disease chest X-rays have similar colors and patterns, which makes the classification performance not optimal. As a solution to this problem, this study used chest Covid-19 X-ray data and 12 types of other lung diseases chest X-ray data to improve classification performance by applying data sampling techniques. Data sampling techniques included Random Undersampling (RUS), Random Oversampling (ROS), Synthetic Minority Over-sampling Technique (SMOTE), and Tomek Link (T-Link) will be evaluated. This study uses Support Vector Machines (SVM) to classify data and evaluation is based on the highest Area Under Curve (AUC) value and accuracy value. ROS found to be the best data sampling technique with an average increase in AUC and accuracy for all datasets of 31.5% and 3.4%, respectively. As a result, the ROS technique helps classify COVID-19 and other lung diseases more accurately.