
Research Article
Feature Importance Investigation for Estimating Covid-19 Infection by Random Forest Algorithm
@INPROCEEDINGS{10.1007/978-3-030-77417-2_20, author={Andr\^{e} Vin\^{\i}cius Gon\`{e}alves and Ione Jayce Ceola Schneider and Fernanda Vargas Amaral and Leandro Pereira Garcia and Gustavo Medeiros de Ara\^{u}jo}, title={Feature Importance Investigation for Estimating Covid-19 Infection by Random Forest Algorithm}, proceedings={Data and Information in Online Environments. Second EAI International Conference, DIONE 2021, Virtual Event, March 10--12, 2021, Proceedings}, proceedings_a={DIONE}, year={2021}, month={6}, keywords={Feature importance Feature engineering Machine learning Prediction model COVID-19}, doi={10.1007/978-3-030-77417-2_20} }
- André Vinícius Gonçalves
Ione Jayce Ceola Schneider
Fernanda Vargas Amaral
Leandro Pereira Garcia
Gustavo Medeiros de Araújo
Year: 2021
Feature Importance Investigation for Estimating Covid-19 Infection by Random Forest Algorithm
DIONE
Springer
DOI: 10.1007/978-3-030-77417-2_20
Abstract
The present work raises an investigation about the feature importance to estimate the COVID-19 infection, using Machine Learning approach. Our work analyzed 175 features, using the Permutation Importance method, to assess the importance and list the twenty most relevant ones that represent the probability of infection of the disease. Among all features, the most important were: i) the period comprised between the date of notification and symptom onset stand out, ii) the rate of confirmed in the territory of health units in the last 14 days, iii) the rate of discarded and removed from the health territory, iv) the age, v) variables of the traffic flow and vi) symptoms features as fever, cough and sore throat. The model was validated and reached an accuracy average of 78.19%, whereas the sensitivity and specificity achieved 83.05% and the 75.50% respectively in the infection estimate. Therefore, the proposed investigation represents an alternative to guide authorities in understanding aspects related to the disease.