
Research Article
Optimizing Computing Job Scheduling and Path Planning with Multi-objectives
@INPROCEEDINGS{10.1007/978-3-031-65123-6_11, author={Haoran Song and Chengxiao Yu and Kang Liu and Deyun Gao and Xuening Shang and Hanxiao Yan}, title={Optimizing Computing Job Scheduling and Path Planning with Multi-objectives}, proceedings={Quality, Reliability, Security and Robustness in Heterogeneous Systems. 19th EAI International Conference, QShine 2023, Shenzhen, China, October 8 -- 9, 2023, Proceedings, Part II}, proceedings_a={QSHINE PART 2}, year={2024}, month={8}, keywords={Job scheduling Path planning Parameter server Parallel computing}, doi={10.1007/978-3-031-65123-6_11} }
- Haoran Song
Chengxiao Yu
Kang Liu
Deyun Gao
Xuening Shang
Hanxiao Yan
Year: 2024
Optimizing Computing Job Scheduling and Path Planning with Multi-objectives
QSHINE PART 2
Springer
DOI: 10.1007/978-3-031-65123-6_11
Abstract
Machine learning model training relies on parameter server architecture and data parallel mechanism. It is important to achieve deadline-guarantee, energy-saving, and efficient network bandwidth usage objectives simultaneously. In this paper, an integer programming model is formulated to optimize the problem in the scenario of machine learning training. We then propose a heuristic Computing Job Scheduling and Routing Planning (CSRP) method to minimize the violation rate of user deadlines, the used server number, and the network cost. CSRP schedules the computing jobs and selects paths based on computing job characteristics and network status. Due to the features of the same computing parameters requirements for parallel computing, the bandwidth consumption can be further reduced by path aggregation. Therefore, we further propose Aggregated CSRP to select the aggregation node and aggregated paths. We evaluate the performance of our proposed algorithms on trace-driven experiments with results showing that CSRP and Aggregated CSRP outperform other methods in terms of deadline guarantee, energy saving, and efficient network bandwidth usage.