
Research Article
Service Function Chain Placement in Cloud Data Center Networks: A Cooperative Multi-agent Reinforcement Learning Approach
@INPROCEEDINGS{10.1007/978-3-031-23141-4_22, author={Lynn Gao and Yutian Chen and Bin Tang}, title={Service Function Chain Placement in Cloud Data Center Networks: A Cooperative Multi-agent Reinforcement Learning Approach}, proceedings={Game Theory for Networks. 11th International EAI Conference, GameNets 2022, Virtual Event, July 7--8, 2022, Proceedings}, proceedings_a={GAMENETS}, year={2023}, month={1}, keywords={Service function chaining Data centers Reinforcement learning k-stroll Problem}, doi={10.1007/978-3-031-23141-4_22} }
- Lynn Gao
Yutian Chen
Bin Tang
Year: 2023
Service Function Chain Placement in Cloud Data Center Networks: A Cooperative Multi-agent Reinforcement Learning Approach
GAMENETS
Springer
DOI: 10.1007/978-3-031-23141-4_22
Abstract
Service function chaining (SFC), consisting of a sequence of virtual network functions (VNFs) (i.e., firewalls and load balancers), is an effective service provision technique in modern data center networks. By requiring cloud user traffic to traverse the VNFs in order, SFC improves the security and performance of the cloud user applications. In this paper, we study how to place an SFC inside a data center to minimize the network traffic of the virtual machine (VM) communication. We take a cooperative multi-agent reinforcement learning approach, wherein multiple agents collaboratively figure out the traffic-efficient route for the VM communication.
Underlying the SFC placement is a fundamental graph-theoretical problem called thek-stroll problem. Given a weighted graphG(V,E), two nodess,(t \in V), and an integerk, thek-stroll problem is to find the shortest path fromstotthat visits at leastkother nodes in the graph. Our work is the first to take a multi-agent learning approach to solvek-stroll problem. We compare our learning algorithm with an optimal and exhaustive algorithm and an existing dynamic programming(DP)-based heuristic algorithm. We show that our learning algorithm, although lacking the complete knowledge of the network assumed by existing research, delivers comparable or even better VM communication time while taking two orders of magnitude of less execution time.