BiC-DDPG: Bidirectionally-Coordinated Nets for Deep Multi-agent Reinforcement Learning

Gongju Wang; Dianxi Shi; Chao Xue; Hao Jiang; Yajie Wang

Collaborative Computing: Networking, Applications and Worksharing. 16th EAI International Conference, CollaborateCom 2020, Shanghai, China, October 16–18, 2020, Proceedings, Part II

Research Article

BiC-DDPG: Bidirectionally-Coordinated Nets for Deep Multi-agent Reinforcement Learning

Download

27 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.1007/978-3-030-67540-0_20,
    author={Gongju Wang and Dianxi Shi and Chao Xue and Hao Jiang and Yajie Wang},
    title={BiC-DDPG: Bidirectionally-Coordinated Nets for Deep Multi-agent Reinforcement Learning},
    proceedings={Collaborative Computing: Networking, Applications and Worksharing. 16th EAI International Conference, CollaborateCom 2020, Shanghai, China, October 16--18, 2020, Proceedings, Part II},
    proceedings_a={COLLABORATECOM PART 2},
    year={2021},
    month={1},
    keywords={Multi-agent deep reinforcement learning Large discrete joint action space Cooperative Mapping method},
    doi={10.1007/978-3-030-67540-0_20}
}

Gongju Wang
Dianxi Shi
Chao Xue
Hao Jiang
Yajie Wang
Year: 2021
BiC-DDPG: Bidirectionally-Coordinated Nets for Deep Multi-agent Reinforcement Learning
COLLABORATECOM PART 2
Springer
DOI: 10.1007/978-3-030-67540-0_20

Gongju Wang¹, Dianxi Shi¹^,*, Chao Xue¹, Hao Jiang², Yajie Wang²

1: Artificial Intelligence Research Center (AIRC), National Innovation Institute of Defense Technology (NIIDT)
2: College of Computer, National University of Defense Technology

*Contact email: dxshi@nudt.edu.cn

Abstract

Multi-agent reinforcement learning (MARL) often faces the problem of policy learning under large action space. There are two reasons for the complex action space: first, the decision space of a single agent in a multi-agent system is huge. Second, the complexity of the joint action space caused by the combination of the action spaces of different agents increases exponentially from the increase in the number of agents. How to learn a robust policy in multi-agent cooperative scenarios is a challenge. To address this challenge we propose an algorithm called bidirectionally-coordinated Deep Deterministic Policy Gradient (BiC-DDPG). In BiC-DDPG three mechanisms were designed based on our insights against the challenge: we used a centralized training and decentralized execution architecture to ensure Markov property and thus ensure the convergence of the algorithm, then we used bi-directional rnn structures to achieve information communication when agents cooperate, finally we used a mapping method to map the continuous joint action space output to the discrete joint action space to solve the problem of agents’ decision-making on large joint action space. A series of fine grained experiments in which include scenarios with cooperative and adversarial relationships between homogeneous agents were designed to evaluate our algorithm. The experiment results show that our algorithm out performing the baseline. -

Keywords: Multi-agent deep reinforcement learning, Large discrete joint action space, Cooperative, Mapping method

Published: 2021-01-22
Appears in: SpringerLink

: http://dx.doi.org/10.1007/978-3-030-67540-0_20

BiC-DDPG: Bidirectionally-Coordinated Nets for Deep Multi-agent Reinforcement Learning

Abstract

About EAI

Community

Publish with EAI