IoT as a Service. 4th EAI International Conference, IoTaaS 2018, Xi’an, China, November 17–18, 2018, Proceedings

Research Article

Actor-Critic for Multi-agent System with Variable Quantity of Agents

Download
267 downloads
  • @INPROCEEDINGS{10.1007/978-3-030-14657-3_5,
        author={Guihong Wang and Jinglun Shi},
        title={Actor-Critic for Multi-agent System with Variable Quantity of Agents},
        proceedings={IoT as a Service. 4th EAI International Conference, IoTaaS 2018, Xi’an, China, November 17--18, 2018, Proceedings},
        proceedings_a={IOTAAS},
        year={2019},
        month={3},
        keywords={Multi-agent Reinforcement learning Variable quantity of agents Communication Fine-tune},
        doi={10.1007/978-3-030-14657-3_5}
    }
    
  • Guihong Wang
    Jinglun Shi
    Year: 2019
    Actor-Critic for Multi-agent System with Variable Quantity of Agents
    IOTAAS
    Springer
    DOI: 10.1007/978-3-030-14657-3_5
Guihong Wang1,*, Jinglun Shi1,*
  • 1: South China University of Technology
*Contact email: eew.guihong@mail.scut.edu.cn, shijl@scut.edu.cn

Abstract

Reinforcement learning (RL) has been applied to many cooperative multi-agent systems recently. However, most of research have been carried on the systems with fixed quantity of agents. In reality, the quantity of agents in the system is often changed over time, and the majority of multi-agent reinforcement learning (MARL) models can’t work robustly on these systems. In this paper, we propose a model extended from actor-critic framework to process the systems with variable quantity of agents. To deal with the variable quantity issue, we design a feature extractor to embed variable length states. By employing bidirectional long short term memory (BLSTM) in actor network, which is capable of process variable length sequences, any number of agents can communicate and coordinate with each other. However, it is noted that the BLSTM is generally used to process sequences, so we use the critic network as an importance estimator for all agents and organize them into a sequence. Experiments show that our model works well in the variable quantity situation and outperform other models. Although our model may perform poorly when the quantity is too large, without changing hyper-parameters, it can be fine-tuned and achieve acceptable performance in a short time.