Research Article
Performance of K-means in Hadoop Using MapReduce Programming Model
@INPROCEEDINGS{10.4108/eai.19-10-2018.2282545, author={Engelbertus Vione and J.B. Budi Darmawan}, title={Performance of K-means in Hadoop Using MapReduce Programming Model}, proceedings={Proceedings of the 1st International Conference on Science and Technology for an Internet of Things, 20 October 2018, Yogyakarta, Indonesia}, publisher={EAI}, proceedings_a={ICSTI}, year={2019}, month={4}, keywords={big data hadoop mapreduce mahout k-means}, doi={10.4108/eai.19-10-2018.2282545} }
- Engelbertus Vione
J.B. Budi Darmawan
Year: 2019
Performance of K-means in Hadoop Using MapReduce Programming Model
ICSTI
EAI
DOI: 10.4108/eai.19-10-2018.2282545
Abstract
Hadoop which is one of the big data framework uses MapReduce programming model to analyze data. Mahout is a data analysis library that has the ability to use MapReduce programming. One of the clustering algorithms supported by Mahout is K-mean. The researchers are interested in observing the performance speed of applying the K-mean algorithm from Mahout to cluster liver disorder data set from UCI with changes in the configuration of the number of slave nodes using Hadoop. This study uses 4 computers with a configuration of 1 master node and 3 slave nodes in the Hadoop cluster that runs on the local network. The results of the average speed of the K-Means process using 344 data sets indicate that increasing the number of slave nodes from one to three will increasenon-linearly the speed of the computational process.