Collaborative Computing: Networking, Applications, and Worksharing. 11th International Conference, CollaborateCom 2015, Wuhan, November 10-11, 2015, China. Proceedings

Research Article

An ARM-Based Hadoop Performance Evaluation Platform: Design and Implementation

Download
451 downloads
  • @INPROCEEDINGS{10.1007/978-3-319-28910-6_8,
        author={Xiaohu Fan and Si Chen and Shipeng Qi and Xincheng Luo and Jing Zeng and Hao Huang and Changsheng Xie},
        title={An ARM-Based Hadoop Performance Evaluation Platform: Design and Implementation},
        proceedings={Collaborative Computing: Networking, Applications, and Worksharing. 11th International Conference, CollaborateCom 2015, Wuhan, November 10-11, 2015, China. Proceedings},
        proceedings_a={COLLABORATECOM},
        year={2016},
        month={2},
        keywords={HPC ARM cluster Cost-effective Data-intensive},
        doi={10.1007/978-3-319-28910-6_8}
    }
    
  • Xiaohu Fan
    Si Chen
    Shipeng Qi
    Xincheng Luo
    Jing Zeng
    Hao Huang
    Changsheng Xie
    Year: 2016
    An ARM-Based Hadoop Performance Evaluation Platform: Design and Implementation
    COLLABORATECOM
    Springer
    DOI: 10.1007/978-3-319-28910-6_8
Xiaohu Fan1,*, Si Chen1,*, Shipeng Qi1,*, Xincheng Luo1,*, Jing Zeng1,*, Hao Huang1,*, Changsheng Xie2,*
  • 1: HUST
  • 2: Wuhan National Laboratory for Optoelecgtronics
*Contact email: fanxiaohu@hust.edu.cn, M201272616@hust.edu.cn, qishipeng@hust.edu.cn, luoxc613@hust.edu.cn, zengjing@hust.edu.cn, thao@hust.edu.cn, cs-xie@hust.edu.cn

Abstract

As the growth of cluster scale, huge power consumption will be a major bottleneck for future large-scale high performance cluster. However, most existing cloud-clusters are based on power-hungry X86-64 which merely aims to common enterprise applications. In this paper, we improve the cluster performance by leveraging ARM SoCs which feature energy-efficient. In our prototype, cluster with five Cubieboard4, we run HPL and achieve 9.025 GFLOPS which exhibits a great computational potential. Moreover, we build our measurement model and conduct extensive evaluation by comparing the performance of the cluster with WordCount, k-Means (etc.) running in Map-Reduce mode and Spark mode respectively. The experiment results demonstrate that our cluster can guarantee higher computational efficiency on compute-intensive utilities with the RDD feature of Spark. Finally, we propose a more suitable theoretical hybrid architecture of future cloud clusters with a stronger master and customized ARMv8 based TaskTrackers for data-intensive computing.