
Research Article
QBRT: Bias and Rising Threshold Algorithm with Q-Learning
@INPROCEEDINGS{10.1007/978-3-031-29126-5_4, author={Ryo Ogino and Masao Kubo and Hiroshi Sato}, title={QBRT: Bias and Rising Threshold Algorithm with Q-Learning}, proceedings={Artificial Intelligence for Communications and Networks. 4th EAI International Conference, AICON 2022, Hiroshima, Japan, November 30 - December 1, 2022, Proceedings}, proceedings_a={AICON}, year={2023}, month={3}, keywords={Multi-agent Reinforcement learning Best-of-n problem Tower of Hanoi}, doi={10.1007/978-3-031-29126-5_4} }
- Ryo Ogino
Masao Kubo
Hiroshi Sato
Year: 2023
QBRT: Bias and Rising Threshold Algorithm with Q-Learning
AICON
Springer
DOI: 10.1007/978-3-031-29126-5_4
Abstract
In multi-agent reinforcement learning, the problems of non-stationarity of the environment and scalability have long been recognized. As a first step toward solving these problems, this paper proposes a learning model, the BRT Algorithm with Q-Learning (hereafter, QBRT), based on the Bias and Rising Threshold (hereafter, BRT) algorithm, which can solve best-of-n problems where the number of options n is greater than 2 (hereafter, best-of-n problems (n >> 2)). This model is characterized by the fact that all of the agents that make up the herd agree in advance on what action the herd will take next. We thought that the problem of non-stationarity could be ameliorated to some extent by having all agents follow the same policy. On the other hand, the time it takes for agents to reach an agreement with each other generally tends to increase as the number of agents increases. In contrast, if BRT is used as a base, the time required for agreement could be kept almost constant even if the number of agents increases.We will validate the problem with an experiment using Tower of Hanoi by Multiagent (hereafter THM), a best-of-n problem (n >> 2) based on the classic puzzle “Tower of Hanoi”, which is a flock coordination problem.