Actor-Critic for Multi-agent System with Variable Quantity of Agents

Guihong Wang; Jinglun Shi

IoT as a Service. 4th EAI International Conference, IoTaaS 2018, Xi’an, China, November 17–18, 2018, Proceedings

Research Article

Actor-Critic for Multi-agent System with Variable Quantity of Agents

Download

657 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.1007/978-3-030-14657-3_5,
    author={Guihong Wang and Jinglun Shi},
    title={Actor-Critic for Multi-agent System with Variable Quantity of Agents},
    proceedings={IoT as a Service. 4th EAI International Conference, IoTaaS 2018, Xi’an, China, November 17--18, 2018, Proceedings},
    proceedings_a={IOTAAS},
    year={2019},
    month={3},
    keywords={Multi-agent Reinforcement learning Variable quantity of agents Communication Fine-tune},
    doi={10.1007/978-3-030-14657-3_5}
}

Guihong Wang
Jinglun Shi
Year: 2019
Actor-Critic for Multi-agent System with Variable Quantity of Agents
IOTAAS
Springer
DOI: 10.1007/978-3-030-14657-3_5

Guihong Wang¹^,*, Jinglun Shi¹^,*

1: South China University of Technology

*Contact email: eew.guihong@mail.scut.edu.cn, shijl@scut.edu.cn

Abstract

Reinforcement learning (RL) has been applied to many cooperative multi-agent systems recently. However, most of research have been carried on the systems with fixed quantity of agents. In reality, the quantity of agents in the system is often changed over time, and the majority of multi-agent reinforcement learning (MARL) models can’t work robustly on these systems. In this paper, we propose a model extended from actor-critic framework to process the systems with variable quantity of agents. To deal with the variable quantity issue, we design a feature extractor to embed variable length states. By employing bidirectional long short term memory (BLSTM) in actor network, which is capable of process variable length sequences, any number of agents can communicate and coordinate with each other. However, it is noted that the BLSTM is generally used to process sequences, so we use the critic network as an importance estimator for all agents and organize them into a sequence. Experiments show that our model works well in the variable quantity situation and outperform other models. Although our model may perform poorly when the quantity is too large, without changing hyper-parameters, it can be fine-tuned and achieve acceptable performance in a short time.

Keywords: Multi-agent Reinforcement learning Variable quantity of agents Communication Fine-tune

Published: 2019-03-07
Appears in: SpringerLink

: http://dx.doi.org/10.1007/978-3-030-14657-3_5

Actor-Critic for Multi-agent System with Variable Quantity of Agents

Abstract

About EAI

Community

Publish with EAI