e-Infrastructure and e-Services for Developing Countries. 11th EAI International Conference, AFRICOMM 2019, Porto-Novo, Benin, December 3–4, 2019, Proceedings

Research Article

Big Data Processing Using Hadoop and Spark: The Case of Meteorology Data

Download
225 downloads
  • @INPROCEEDINGS{10.1007/978-3-030-41593-8_13,
        author={Eslam Hussein and Ronewa Sadiki and Yahlieel Jafta and Muhammad Sungay and Olasupo Ajayi and Antoine Bagula},
        title={Big Data Processing Using Hadoop and Spark: The Case of Meteorology Data},
        proceedings={e-Infrastructure and e-Services for Developing Countries. 11th EAI International Conference, AFRICOMM 2019, Porto-Novo, Benin, December 3--4, 2019, Proceedings},
        proceedings_a={AFRICOMM},
        year={2020},
        month={2},
        keywords={Hadoop MapReduce Spark Hive Meteorology Big data},
        doi={10.1007/978-3-030-41593-8_13}
    }
    
  • Eslam Hussein
    Ronewa Sadiki
    Yahlieel Jafta
    Muhammad Sungay
    Olasupo Ajayi
    Antoine Bagula
    Year: 2020
    Big Data Processing Using Hadoop and Spark: The Case of Meteorology Data
    AFRICOMM
    Springer
    DOI: 10.1007/978-3-030-41593-8_13
Eslam Hussein1, Ronewa Sadiki1, Yahlieel Jafta1, Muhammad Sungay1, Olasupo Ajayi, Antoine Bagula,*
  • 1: University of the Western Cape
*Contact email: abagula@uwc.ac.za

Abstract

Meteorology is a branch of science which can be leveraged to gain useful insight into many phenomenon that have significant impacts on our daily lives such as weather precipitation, cyclones, thunderstorms, climate change. It is a highly data-driven field that involves large datasets of images captured from both radar and satellite, thus requiring efficient technologies for storing, processing and data mining to find hidden patterns in these datasets. Different big data tools and ecosystems, most of them integrating Hadoop and Spark, have been designed to address big data issues. However, despite its importance, only few works have been done on the application of these tools and ecosystems for solving meteorology issues. This paper proposes and evaluate the performance of a precipitation data processing system that builds upon the Cloudera ecosystem to analyse large datasets of images as a classification problem. The system can be used as a replacement to machine learning techniques when the classification problem consists of finding zones of high, moderate and low precipitations in satellite images.