Proceedings of the 1st International Conference on Science and Technology for an Internet of Things, 20 October 2018, Yogyakarta, Indonesia

Research Article

Performance of K-means in Hadoop Using MapReduce Programming Model

Download662 downloads
  • @INPROCEEDINGS{10.4108/eai.19-10-2018.2282545,
        author={Engelbertus Vione and J.B. Budi Darmawan},
        title={Performance of K-means in Hadoop Using MapReduce Programming Model},
        proceedings={Proceedings of the 1st International Conference on Science and Technology for an Internet of Things,  20 October 2018, Yogyakarta, Indonesia},
        publisher={EAI},
        proceedings_a={ICSTI},
        year={2019},
        month={4},
        keywords={big data hadoop mapreduce mahout k-means},
        doi={10.4108/eai.19-10-2018.2282545}
    }
    
  • Engelbertus Vione
    J.B. Budi Darmawan
    Year: 2019
    Performance of K-means in Hadoop Using MapReduce Programming Model
    ICSTI
    EAI
    DOI: 10.4108/eai.19-10-2018.2282545
Engelbertus Vione1,*, J.B. Budi Darmawan1
  • 1: Universitas Sanata Dharma
*Contact email: raikeiji@gmail.com

Abstract

Hadoop which is one of the big data framework uses MapReduce programming model to analyze data. Mahout is a data analysis library that has the ability to use MapReduce programming. One of the clustering algorithms supported by Mahout is K-mean. The researchers are interested in observing the performance speed of applying the K-mean algorithm from Mahout to cluster liver disorder data set from UCI with changes in the configuration of the number of slave nodes using Hadoop. This study uses 4 computers with a configuration of 1 master node and 3 slave nodes in the Hadoop cluster that runs on the local network. The results of the average speed of the K-Means process using 344 data sets indicate that increasing the number of slave nodes from one to three will increasenon-linearly the speed of the computational process.