
Research Article
Solving Traveling Salesman Problem with Deep Reinforcement Learning and Knowledge Distillation
@INPROCEEDINGS{10.1007/978-3-031-65123-6_26, author={Xiaowen Li and Xiaofeng Gao and Shaoyao Niu and Wenxuan He and Wanru Gao and Qidong Liu}, title={Solving Traveling Salesman Problem with Deep Reinforcement Learning and Knowledge Distillation}, proceedings={Quality, Reliability, Security and Robustness in Heterogeneous Systems. 19th EAI International Conference, QShine 2023, Shenzhen, China, October 8 -- 9, 2023, Proceedings, Part II}, proceedings_a={QSHINE PART 2}, year={2024}, month={8}, keywords={traveling salesman problem (TSP) knowledge distillation reinforcement learning}, doi={10.1007/978-3-031-65123-6_26} }
- Xiaowen Li
Xiaofeng Gao
Shaoyao Niu
Wenxuan He
Wanru Gao
Qidong Liu
Year: 2024
Solving Traveling Salesman Problem with Deep Reinforcement Learning and Knowledge Distillation
QSHINE PART 2
Springer
DOI: 10.1007/978-3-031-65123-6_26
Abstract
The Traveling Salesman Problem (TSP) is a renowned combinatorial optimization problem with wide-ranging practical applications. However, TSP is an NP-hard problem, which makes finding an efficient and accurate solution computationally challenging. While deep learning and reinforcement learning have shown promise in solving TSP, prevailing methods suffer from some limitations, such as excessive model complexity and long inference durations. In this work, we propose a method for solving TSP, which extracts knowledge from a complex teacher model to a lightweight student model. The teacher model adopts an encoder-decoder framework and uses the mixed chunk attention mechanism to extract the features of the input city sequence. The student model adopts the same architecture as the teacher model, but simplifies network parameters by decreasing the dimensionality of hidden layers via knowledge distillation. Instead of using a tour generated by the teacher model to guide the student model, our teacher model guides the student model at each time step. Extensive experiments demonstrate the competitive and effective nature of our model.