Research Article
Using Data Sampling Technique for Improving Classification of Covid-19 and Lung Diseases
@INPROCEEDINGS{10.4108/eai.2-12-2021.2320242, author={A R Purnajaya and F D Hanggara}, title={Using Data Sampling Technique for Improving Classification of Covid-19 and Lung Diseases}, proceedings={Proceedings of the 2nd Universitas Kuningan International Conference on System, Engineering, and Technology, UNISET 2021, 2 December 2021, Kuningan, West Java, Indonesia}, publisher={EAI}, proceedings_a={UNISET}, year={2022}, month={8}, keywords={lung diseases; smote; random oversampling}, doi={10.4108/eai.2-12-2021.2320242} }
- A R Purnajaya
F D Hanggara
Year: 2022
Using Data Sampling Technique for Improving Classification of Covid-19 and Lung Diseases
UNISET
EAI
DOI: 10.4108/eai.2-12-2021.2320242
Abstract
The Covid-19 is a virus that has spread around the world and can cause infected respiratory tracts to die. One solution to this problem is to classify the Covid-19 chest X-ray. Among the challenges in this area is improving the classification performance of Covid-19 chest X-rays. Covid-19 chest X-ray and other lung disease chest X-rays have similar colors and patterns, which makes the classification performance not optimal. As a solution to this problem, this study used chest Covid-19 X-ray data and 12 types of other lung diseases chest X-ray data to improve classification performance by applying data sampling techniques. Data sampling techniques included Random Undersampling (RUS), Random Oversampling (ROS), Synthetic Minority Over-sampling Technique (SMOTE), and Tomek Link (T-Link) will be evaluated. This study uses Support Vector Machines (SVM) to classify data and evaluation is based on the highest Area Under Curve (AUC) value and accuracy value. ROS found to be the best data sampling technique with an average increase in AUC and accuracy for all datasets of 31.5% and 3.4%, respectively. As a result, the ROS technique helps classify COVID-19 and other lung diseases more accurately.