About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Artificial Intelligence for Communications and Networks. 4th EAI International Conference, AICON 2022, Hiroshima, Japan, November 30 - December 1, 2022, Proceedings

Research Article

QBRT: Bias and Rising Threshold Algorithm with Q-Learning

Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-031-29126-5_4,
        author={Ryo Ogino and Masao Kubo and Hiroshi Sato},
        title={QBRT: Bias and Rising Threshold Algorithm with Q-Learning},
        proceedings={Artificial Intelligence for Communications and Networks. 4th EAI International Conference, AICON 2022, Hiroshima, Japan, November 30 - December 1, 2022, Proceedings},
        proceedings_a={AICON},
        year={2023},
        month={3},
        keywords={Multi-agent Reinforcement learning Best-of-n problem Tower of Hanoi},
        doi={10.1007/978-3-031-29126-5_4}
    }
    
  • Ryo Ogino
    Masao Kubo
    Hiroshi Sato
    Year: 2023
    QBRT: Bias and Rising Threshold Algorithm with Q-Learning
    AICON
    Springer
    DOI: 10.1007/978-3-031-29126-5_4
Ryo Ogino1,*, Masao Kubo1, Hiroshi Sato1
  • 1: National Defense Academy, 1-10-20, Hashirimizu
*Contact email: em60010@nda.ac.jp

Abstract

In multi-agent reinforcement learning, the problems of non-stationarity of the environment and scalability have long been recognized. As a first step toward solving these problems, this paper proposes a learning model, the BRT Algorithm with Q-Learning (hereafter, QBRT), based on the Bias and Rising Threshold (hereafter, BRT) algorithm, which can solve best-of-n problems where the number of options n is greater than 2 (hereafter, best-of-n problems (n >> 2)). This model is characterized by the fact that all of the agents that make up the herd agree in advance on what action the herd will take next. We thought that the problem of non-stationarity could be ameliorated to some extent by having all agents follow the same policy. On the other hand, the time it takes for agents to reach an agreement with each other generally tends to increase as the number of agents increases. In contrast, if BRT is used as a base, the time required for agreement could be kept almost constant even if the number of agents increases.We will validate the problem with an experiment using Tower of Hanoi by Multiagent (hereafter THM), a best-of-n problem (n >> 2) based on the classic puzzle “Tower of Hanoi”, which is a flock coordination problem.

Keywords
Multi-agent Reinforcement learning Best-of-n problem Tower of Hanoi
Published
2023-03-26
Appears in
SpringerLink
http://dx.doi.org/10.1007/978-3-031-29126-5_4
Copyright © 2022–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL