phat 20(23): e3

Research Article

Design of Novel ETL Model to Analyse Corona Virus Data

Download150 downloads
  • @ARTICLE{10.4108/eai.13-7-2018.165671,
        author={Amit Kumar Dewangan and S.M. Ghosh and Akhilesh Kumar Shrivas},
        title={Design of Novel ETL Model to Analyse Corona Virus Data},
        journal={EAI Endorsed Transactions on Pervasive Health and Technology},
        volume={6},
        number={23},
        publisher={EAI},
        journal_a={PHAT},
        year={2020},
        month={7},
        keywords={Corona Virus, Text Mining, Data Analytics, ETL, Covid-19, Pandemic},
        doi={10.4108/eai.13-7-2018.165671}
    }
    
  • Amit Kumar Dewangan
    S.M. Ghosh
    Akhilesh Kumar Shrivas
    Year: 2020
    Design of Novel ETL Model to Analyse Corona Virus Data
    PHAT
    EAI
    DOI: 10.4108/eai.13-7-2018.165671
Amit Kumar Dewangan1,*, S.M. Ghosh2, Akhilesh Kumar Shrivas3
  • 1: Department of Information Technology, Guru Ghasidas Vishwavidyalaya, Bilaspur. India
  • 2: Department of Computer Science and Engineering, Dr. C.V. Raman University, Kota, Bilaspur. India
  • 3: Department of Computer Science and Information Technology, Guru Ghasidas Vishwavidyalaya, Bilaspur. India
*Contact email: amit.nitrr@gmail.com

Abstract

INTRODUCTION: The corona disease was first recognized in 2019 in Wuhan, which is the capital of China’s Hubei-province, and from then it continued spreading and as a result declared as a pandemic by all nations. The COVID-19 virus has different effects on people in various ways. It is a kind of respiratory disease. The confirmed cases are increasing day to day in India, which leads to complete lockdown throughout the nation. OBJECTIVE: The objective of this research is to design a novel Extract-Trandform and Load NETL model to analyse covid19 data in india. METHODS: The extraction of useful information from a large database is a well-connected research field of text mining. This paper is proposed a novel extract-transform-load ETL model to process the COVID-19 data of India to get the exact recovery data from the multiple data sources from different states of India. In this, a knowledgebased model that generate knowledge based on three different module split, validation, and join is discussed. RESULTS: The outcomes of the proposed NETL process are, output file which has the description of total positive cases, active cases, recovery cases, and death rate, based on different regions. The analysis of NETL is done based on accuracy, failure count, and execution time. The proposed NETL process is more accurate and taking less compilation time with minimum failure count as compared with existing models. CONCLUSION: To analyze the coronavirus data in India, a novel ETL (NETL) model is proposed. In this model, a total of 9 CSV files is processed as input files to get different results in different categories. This model is having three modules namely splitting, verification, and join. The dataset is split into based on its coupling attributes and then joined with a single value to produce the updated results as per the current dataset. The last stage of this process is to join the data which is generated through splitting. The proposed NETL model is more accurate as compared with existing ETM models.