Collaborative Computing: Networking, Applications and Worksharing. 13th International Conference, CollaborateCom 2017, Edinburgh, UK, December 11–13, 2017, Proceedings

Research Article

Performance Analysis of Storm in a Real-World Big Data Stream Computing Environment

Download
33 downloads
  • @INPROCEEDINGS{10.1007/978-3-030-00916-8_57,
        author={Hongbin Yan and Dawei Sun and Shang Gao and Zhangbing Zhou},
        title={Performance Analysis of Storm in a Real-World Big Data Stream Computing Environment},
        proceedings={Collaborative Computing: Networking, Applications and Worksharing. 13th International Conference, CollaborateCom 2017, Edinburgh, UK, December 11--13, 2017, Proceedings},
        proceedings_a={COLLABORATECOM},
        year={2018},
        month={10},
        keywords={Storm Performance analysis Stream computing Big data computing Big data},
        doi={10.1007/978-3-030-00916-8_57}
    }
    
  • Hongbin Yan
    Dawei Sun
    Shang Gao
    Zhangbing Zhou
    Year: 2018
    Performance Analysis of Storm in a Real-World Big Data Stream Computing Environment
    COLLABORATECOM
    Springer
    DOI: 10.1007/978-3-030-00916-8_57
Hongbin Yan1,*, Dawei Sun1,*, Shang Gao2,*, Zhangbing Zhou1,*
  • 1: China University of Geosciences
  • 2: Deakin University
*Contact email: yanhongbin@cugb.edu.cn, sundaweicn@cugb.edu.cn, shang.gao@deakin.edu.au, zhangbing.zhou@gmail.com

Abstract

As an important distributed real-time computation system, Storm has been widely used in a number of applications such as online machine learning, continuous computation, distributed RPC, and more. Storm is designed to process massive data streams in real time. However, there have been few studies conducted to evaluate the performance characteristics clusters in Storm. In this paper, we analyze the performance of a Storm cluster mainly from two aspects, hardware configuration and parallelism setting. Key factors that affect the throughput and latency of the Storm cluster are identified, and the performance of Storm’s fault-tolerant mechanism is evaluated, which help users use the computation system more efficiently.