
Research Article
Multi-D3QN: A Multi-strategy Deep Reinforcement Learning for Service Composition in Cloud Manufacturing
@INPROCEEDINGS{10.1007/978-3-030-92638-0_14, author={Jun Zeng and Juan Yao and Yang Yu and Yingbo Wu}, title={Multi-D3QN: A Multi-strategy Deep Reinforcement Learning for Service Composition in Cloud Manufacturing}, proceedings={Collaborative Computing: Networking, Applications and Worksharing. 17th EAI International Conference, CollaborateCom 2021, Virtual Event, October 16-18, 2021, Proceedings, Part II}, proceedings_a={COLLABORATECOM PART 2}, year={2022}, month={1}, keywords={Cloud manufacturing Dynamic service composition Quality of service Deep reinforcement learning}, doi={10.1007/978-3-030-92638-0_14} }
- Jun Zeng
Juan Yao
Yang Yu
Yingbo Wu
Year: 2022
Multi-D3QN: A Multi-strategy Deep Reinforcement Learning for Service Composition in Cloud Manufacturing
COLLABORATECOM PART 2
Springer
DOI: 10.1007/978-3-030-92638-0_14
Abstract
Service composition is an indispensable technology in the cloud manufacturing process to ensure the smooth execution of tasks. To implement effective and accurate service composition strategies, many researchers choose to use Meta-heuristics algorithms with strong optimization capabilities. However, as users’ demand of personalized products increasing, dynamic service composition is essential. Meta-heuristics algorithms lack dynamic adaptability, so they are not suitable for solving complex and dynamic service composition problems. Deep Reinforcement Learning (DRL) algorithm is difficult to reach a stable state, when the hyper-parameters and rewards in the algorithm are not properly designed. To solve these problems, we propose a Multi-strategy Deep Reinforcement Learning (DRL) algorithm, named Multi-D3QN, which combines the basic DQN algorithm, the dueling architecture, the double estimator and the prioritized replay mechanism. Meanwhile, we add some strategies such as instant reward, the ɛ-greedy policy and a heuristic strategy to ensure better performance of the algorithm in dynamic environment. Experiments show that our proposed method not only adapt to the dynamic environment, but also obtain a better solution.