About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Broadband Communications, Networks, and Systems. 12th EAI International Conference, BROADNETS 2021, Virtual Event, October 28–29, 2021, Proceedings

Research Article

A Machine Learning-Based Elastic Strategy for Operator Parallelism in a Big Data Stream Computing System

Download(Requires a free EAI acccount)
3 downloads
Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-030-93479-8_1,
        author={Wei Li and Dawei Sun and Shang Gao and Rajkumar Buyya},
        title={A Machine Learning-Based Elastic Strategy for Operator Parallelism in a Big Data Stream Computing System},
        proceedings={Broadband Communications, Networks, and Systems. 12th EAI International Conference, BROADNETS 2021, Virtual Event, October 28--29, 2021, Proceedings},
        proceedings_a={BROADNETS},
        year={2022},
        month={1},
        keywords={Operator parallelism Runtime awareness Resource allocation Machine learning Stream computing Distributed system},
        doi={10.1007/978-3-030-93479-8_1}
    }
    
  • Wei Li
    Dawei Sun
    Shang Gao
    Rajkumar Buyya
    Year: 2022
    A Machine Learning-Based Elastic Strategy for Operator Parallelism in a Big Data Stream Computing System
    BROADNETS
    Springer
    DOI: 10.1007/978-3-030-93479-8_1
Wei Li1, Dawei Sun1,*, Shang Gao2, Rajkumar Buyya3
  • 1: School of Information Engineering, China University of Geosciences
  • 2: School of Information Technology, Deakin University, Melbourne
  • 3: Cloud Computing and Distributed Systems (CLOUDS) Laboratory, School of Computing and Information Systems
*Contact email: sundaweicn@cugb.edu.cn

Abstract

Elastic scaling in/out of operator parallelism degree is needed for processing real time dynamic data streams under low latency and high stability requirements. Usually the operator parallelism degree is set when a streaming application is submitted to a stream computing system and kept intact during runtime. This may substantially affect the performance of the system due to the fluctuation of input streams and availability of system resources. To address the problems brought by the static parallelism setting, we propose and implement a machine learning based elastic strategy for operator parallelism (named Me-Stream) in big data stream computing systems. The architecture of Me-Stream and its key models are introduced, including parallel bottleneck identification, parameter plan generation, parameter migration and conversion, and instances scheduling. Metrics of execution latency and process latency of the proposed scheduling strategy are evaluated on the widely used big data stream computing system Apache Storm. The experimental results demonstrate the efficiency and effectiveness of the proposed strategy.

Keywords
Operator parallelism Runtime awareness Resource allocation Machine learning Stream computing Distributed system
Published
2022-01-01
Appears in
SpringerLink
http://dx.doi.org/10.1007/978-3-030-93479-8_1
Copyright © 2021–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL