Deep Reinforcement Learning for Multi-Sequence Combinatorial Optimization Problems

Dai Zhang; Xue Zhang

Proceedings of the 2nd International Conference on Machine Learning and Automation, CONF-MLA 2024, November 21, 2024, Adana, Turkey

Research Article

Deep Reinforcement Learning for Multi-Sequence Combinatorial Optimization Problems

Download872 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.4108/eai.21-11-2024.2354636,
    author={Dai  Zhang and Xue  Zhang},
    title={Deep Reinforcement Learning for Multi-Sequence Combinatorial Optimization Problems},
    proceedings={Proceedings of the 2nd International Conference on Machine Learning and Automation, CONF-MLA 2024, November 21, 2024, Adana, Turkey},
    publisher={EAI},
    proceedings_a={CONF-MLA},
    year={2025},
    month={3},
    keywords={combinatorial optimization problem multiple-sequence cop deep reinforcement learning},
    doi={10.4108/eai.21-11-2024.2354636}
}

Dai Zhang
Xue Zhang
Year: 2025
Deep Reinforcement Learning for Multi-Sequence Combinatorial Optimization Problems
CONF-MLA
EAI
DOI: 10.4108/eai.21-11-2024.2354636

Dai Zhang¹^,*, Xue Zhang¹

1: Hitachi (China) Ltd., Corp

*Contact email: zhangd@hitachi.cn

Abstract

We propose a formulation for solving sequential combinatorial optimization problems (COP) by deep reinforcement learning (DRL) method. Some types of decision problems can be reduced to sequence combination optimization problems(SCOP) , such as knapsack problems, flow-shop problems, etc. Besides some single sequence combinatorial optimization problems, multi-sequence combinatorial optimization problems have more complex constraints and are difficult to solve quickly in finite time. In this paper we extend a DRL model based on multiple pointer networks and policy gradient approach to solve multiple-sequence COP. We take flow-shop problem as example to verify the efficiency of improved model. As the result shows, the DRL method can calculate better results within 1 minute than some heuristic algorithms that should optimize with 0.5 hour or 1 hour time consumption. Due to its randomness heuristic optimization method can also produce better solutions, but the difference between DRL and the best heuristic results is within 5%. Therefore, compared to heuristic optimization methods, DRL has competitive optimization capabilities and significantly higher computational efficiency.

Keywords: combinatorial optimization problem, multiple-sequence cop, deep reinforcement learning

Published: 2025-03-11
Publisher: EAI

: http://dx.doi.org/10.4108/eai.21-11-2024.2354636

Deep Reinforcement Learning for Multi-Sequence Combinatorial Optimization Problems

Abstract

About EAI

Community

Publish with EAI