About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Collaborative Computing: Networking, Applications and Worksharing. 16th EAI International Conference, CollaborateCom 2020, Shanghai, China, October 16–18, 2020, Proceedings, Part II

Research Article

BiC-DDPG: Bidirectionally-Coordinated Nets for Deep Multi-agent Reinforcement Learning

Download(Requires a free EAI acccount)
3 downloads
Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-030-67540-0_20,
        author={Gongju Wang and Dianxi Shi and Chao Xue and Hao Jiang and Yajie Wang},
        title={BiC-DDPG: Bidirectionally-Coordinated Nets for Deep Multi-agent Reinforcement Learning},
        proceedings={Collaborative Computing: Networking, Applications and Worksharing. 16th EAI International Conference, CollaborateCom 2020, Shanghai, China, October 16--18, 2020, Proceedings, Part II},
        proceedings_a={COLLABORATECOM PART 2},
        year={2021},
        month={1},
        keywords={Multi-agent deep reinforcement learning Large discrete joint action space Cooperative Mapping method},
        doi={10.1007/978-3-030-67540-0_20}
    }
    
  • Gongju Wang
    Dianxi Shi
    Chao Xue
    Hao Jiang
    Yajie Wang
    Year: 2021
    BiC-DDPG: Bidirectionally-Coordinated Nets for Deep Multi-agent Reinforcement Learning
    COLLABORATECOM PART 2
    Springer
    DOI: 10.1007/978-3-030-67540-0_20
Gongju Wang1, Dianxi Shi1,*, Chao Xue1, Hao Jiang2, Yajie Wang2
  • 1: Artificial Intelligence Research Center (AIRC), National Innovation Institute of Defense Technology (NIIDT)
  • 2: College of Computer, National University of Defense Technology
*Contact email: dxshi@nudt.edu.cn

Abstract

Multi-agent reinforcement learning (MARL) often faces the problem of policy learning under large action space. There are two reasons for the complex action space: first, the decision space of a single agent in a multi-agent system is huge. Second, the complexity of the joint action space caused by the combination of the action spaces of different agents increases exponentially from the increase in the number of agents. How to learn a robust policy in multi-agent cooperative scenarios is a challenge. To address this challenge we propose an algorithm called bidirectionally-coordinated Deep Deterministic Policy Gradient (BiC-DDPG). In BiC-DDPG three mechanisms were designed based on our insights against the challenge: we used a centralized training and decentralized execution architecture to ensure Markov property and thus ensure the convergence of the algorithm, then we used bi-directional rnn structures to achieve information communication when agents cooperate, finally we used a mapping method to map the continuous joint action space output to the discrete joint action space to solve the problem of agents’ decision-making on large joint action space. A series of fine grained experiments in which include scenarios with cooperative and adversarial relationships between homogeneous agents were designed to evaluate our algorithm. The experiment results show that our algorithm out performing the baseline. -

Keywords
Multi-agent deep reinforcement learning Large discrete joint action space Cooperative Mapping method
Published
2021-01-22
Appears in
SpringerLink
http://dx.doi.org/10.1007/978-3-030-67540-0_20
Copyright © 2020–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL