HConfig: Resource Adaptive Fast Bulk Loading in HBase

Xianqiang Bao; Ling Liu; Nong Xiao; Fang Liu; Qi Zhang; Tao Zhu

10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing

Research Article

HConfig: Resource Adaptive Fast Bulk Loading in HBase

Download940 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.4108/icst.collaboratecom.2014.257304,
    author={Xianqiang Bao and Ling Liu and Nong Xiao and Fang Liu and Qi Zhang and Tao Zhu},
    title={HConfig: Resource Adaptive Fast Bulk Loading in HBase},
    proceedings={10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing},
    publisher={IEEE},
    proceedings_a={COLLABORATECOM},
    year={2014},
    month={11},
    keywords={hbase bulk loading optimization big data},
    doi={10.4108/icst.collaboratecom.2014.257304}
}

Xianqiang Bao
Ling Liu
Nong Xiao
Fang Liu
Qi Zhang
Tao Zhu
Year: 2014
HConfig: Resource Adaptive Fast Bulk Loading in HBase
COLLABORATECOM
IEEE
DOI: 10.4108/icst.collaboratecom.2014.257304

Xianqiang Bao¹, Ling Liu², Nong Xiao¹^,*, Fang Liu¹, Qi Zhang², Tao Zhu²

1: State Key Laboratory of High Performance Computing,National University of Defense Technology
2: College of Computing,Georgia Institute of Technology

*Contact email: nongxiao@nudt.edu.cn

Abstract

NoSQL (Not only SQL) data stores become a vital component in many big data computing platforms due to its inherent horizontal scalability. HBase is an open-source distributed NoSQL store that is widely used by many Internet enterprises to handle their big data computing applications (e.g. Facebook handles millions of messages each day with HBase). Optimizations that can enhance the performance of HBase are of paramount interests for big data applications that use HBase or Big Table like key-value stores. In this paper we study the problems inherent in misconfiguration of HBase clusters, including scenarios where the HBase default configurations can lead to poor performance. We develop HConfig, a semi-automated configuration manager for optimizing HBase system performance from multiple dimensions. Due to the space constraint, this paper will focus on how to improve the performance of HBase data loader using HConfig. Through this case study we will highlight the importance of resource adaptive and workload aware auto-configuration management and the design principles of HConfig. Our experiments show that the HConfig enhanced bulk loading can significantly improve the performance of HBase bulk loading jobs compared to the HBase default configuration, and achieve 2~3.7x speedup in throughput under different client threads while maintaining linear horizontal scalability.

Keywords: hbase bulk loading optimization big data

Published: 2014-11-11
Publisher: IEEE

: http://dx.doi.org/10.4108/icst.collaboratecom.2014.257304

HConfig: Resource Adaptive Fast Bulk Loading in HBase

Abstract

About EAI

Community

Publish with EAI