About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Quality, Reliability, Security and Robustness in Heterogeneous Systems. 19th EAI International Conference, QShine 2023, Shenzhen, China, October 8 – 9, 2023, Proceedings, Part II

Research Article

Optimizing Computing Job Scheduling and Path Planning with Multi-objectives

Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-031-65123-6_11,
        author={Haoran Song and Chengxiao Yu and Kang Liu and Deyun Gao and Xuening Shang and Hanxiao Yan},
        title={Optimizing Computing Job Scheduling and Path Planning with Multi-objectives},
        proceedings={Quality, Reliability, Security and Robustness in Heterogeneous Systems. 19th EAI International Conference, QShine 2023, Shenzhen, China, October 8 -- 9, 2023, Proceedings, Part II},
        proceedings_a={QSHINE PART 2},
        year={2024},
        month={8},
        keywords={Job scheduling Path planning Parameter server Parallel computing},
        doi={10.1007/978-3-031-65123-6_11}
    }
    
  • Haoran Song
    Chengxiao Yu
    Kang Liu
    Deyun Gao
    Xuening Shang
    Hanxiao Yan
    Year: 2024
    Optimizing Computing Job Scheduling and Path Planning with Multi-objectives
    QSHINE PART 2
    Springer
    DOI: 10.1007/978-3-031-65123-6_11
Haoran Song1, Chengxiao Yu,*, Kang Liu, Deyun Gao1, Xuening Shang1, Hanxiao Yan1
  • 1: School of Electronic and Information Engineering, Beijing Jiaotong University
*Contact email: yuchx@pcl.ac.cn

Abstract

Machine learning model training relies on parameter server architecture and data parallel mechanism. It is important to achieve deadline-guarantee, energy-saving, and efficient network bandwidth usage objectives simultaneously. In this paper, an integer programming model is formulated to optimize the problem in the scenario of machine learning training. We then propose a heuristic Computing Job Scheduling and Routing Planning (CSRP) method to minimize the violation rate of user deadlines, the used server number, and the network cost. CSRP schedules the computing jobs and selects paths based on computing job characteristics and network status. Due to the features of the same computing parameters requirements for parallel computing, the bandwidth consumption can be further reduced by path aggregation. Therefore, we further propose Aggregated CSRP to select the aggregation node and aggregated paths. We evaluate the performance of our proposed algorithms on trace-driven experiments with results showing that CSRP and Aggregated CSRP outperform other methods in terms of deadline guarantee, energy saving, and efficient network bandwidth usage.

Keywords
Job scheduling Path planning Parameter server Parallel computing
Published
2024-08-20
Appears in
SpringerLink
http://dx.doi.org/10.1007/978-3-031-65123-6_11
Copyright © 2023–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL