Performance of K-means in Hadoop Using MapReduce Programming Model

Engelbertus Vione; J.B. Budi Darmawan

Proceedings of the 1st International Conference on Science and Technology for an Internet of Things, 20 October 2018, Yogyakarta, Indonesia

Research Article

Performance of K-means in Hadoop Using MapReduce Programming Model

Download1500 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.4108/eai.19-10-2018.2282545,
    author={Engelbertus Vione and J.B. Budi Darmawan},
    title={Performance of K-means in Hadoop Using MapReduce Programming Model},
    proceedings={Proceedings of the 1st International Conference on Science and Technology for an Internet of Things,  20 October 2018, Yogyakarta, Indonesia},
    publisher={EAI},
    proceedings_a={ICSTI},
    year={2019},
    month={4},
    keywords={big data hadoop mapreduce mahout k-means},
    doi={10.4108/eai.19-10-2018.2282545}
}

Engelbertus Vione
J.B. Budi Darmawan
Year: 2019
Performance of K-means in Hadoop Using MapReduce Programming Model
ICSTI
EAI
DOI: 10.4108/eai.19-10-2018.2282545

Engelbertus Vione¹^,*, J.B. Budi Darmawan¹

1: Universitas Sanata Dharma

*Contact email: raikeiji@gmail.com

Abstract

Hadoop which is one of the big data framework uses MapReduce programming model to analyze data. Mahout is a data analysis library that has the ability to use MapReduce programming. One of the clustering algorithms supported by Mahout is K-mean. The researchers are interested in observing the performance speed of applying the K-mean algorithm from Mahout to cluster liver disorder data set from UCI with changes in the configuration of the number of slave nodes using Hadoop. This study uses 4 computers with a configuration of 1 master node and 3 slave nodes in the Hadoop cluster that runs on the local network. The results of the average speed of the K-Means process using 344 data sets indicate that increasing the number of slave nodes from one to three will increasenon-linearly the speed of the computational process.

Keywords: big data, hadoop, mapreduce, mahout, k-means

Published: 2019-04-14
Publisher: EAI

: http://dx.doi.org/10.4108/eai.19-10-2018.2282545

Performance of K-means in Hadoop Using MapReduce Programming Model

Abstract

About EAI

Community

Publish with EAI