Research Article
Big Data Processing Using Hadoop and Spark: The Case of Meteorology Data
@INPROCEEDINGS{10.1007/978-3-030-41593-8_13, author={Eslam Hussein and Ronewa Sadiki and Yahlieel Jafta and Muhammad Sungay and Olasupo Ajayi and Antoine Bagula}, title={Big Data Processing Using Hadoop and Spark: The Case of Meteorology Data}, proceedings={e-Infrastructure and e-Services for Developing Countries. 11th EAI International Conference, AFRICOMM 2019, Porto-Novo, Benin, December 3--4, 2019, Proceedings}, proceedings_a={AFRICOMM}, year={2020}, month={2}, keywords={Hadoop MapReduce Spark Hive Meteorology Big data}, doi={10.1007/978-3-030-41593-8_13} }
- Eslam Hussein
Ronewa Sadiki
Yahlieel Jafta
Muhammad Sungay
Olasupo Ajayi
Antoine Bagula
Year: 2020
Big Data Processing Using Hadoop and Spark: The Case of Meteorology Data
AFRICOMM
Springer
DOI: 10.1007/978-3-030-41593-8_13
Abstract
Meteorology is a branch of science which can be leveraged to gain useful insight into many phenomenon that have significant impacts on our daily lives such as weather precipitation, cyclones, thunderstorms, climate change. It is a highly data-driven field that involves large datasets of images captured from both radar and satellite, thus requiring efficient technologies for storing, processing and data mining to find hidden patterns in these datasets. Different big data tools and ecosystems, most of them integrating Hadoop and Spark, have been designed to address big data issues. However, despite its importance, only few works have been done on the application of these tools and ecosystems for solving meteorology issues. This paper proposes and evaluate the performance of a precipitation data processing system that builds upon the Cloudera ecosystem to analyse large datasets of images as a classification problem. The system can be used as a replacement to machine learning techniques when the classification problem consists of finding zones of high, moderate and low precipitations in satellite images.