About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Performance Evaluation Methodologies and Tools. 14th EAI International Conference, VALUETOOLS 2021, Virtual Event, October 30–31, 2021, Proceedings

Research Article

A Novel Implementation of Q-Learning for the Whittle Index

Download(Requires a free EAI acccount)
8 downloads
Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-030-92511-6_10,
        author={Lachlan J. Gibson and Peter Jacko and Yoni Nazarathy},
        title={A Novel Implementation of Q-Learning for the Whittle Index},
        proceedings={Performance Evaluation Methodologies and Tools. 14th EAI International Conference, VALUETOOLS 2021, Virtual Event, October 30--31, 2021, Proceedings},
        proceedings_a={VALUETOOLS},
        year={2021},
        month={12},
        keywords={Multi-armed bandits Restless bandits Reinforcement learning Q-learning Markov decision processes},
        doi={10.1007/978-3-030-92511-6_10}
    }
    
  • Lachlan J. Gibson
    Peter Jacko
    Yoni Nazarathy
    Year: 2021
    A Novel Implementation of Q-Learning for the Whittle Index
    VALUETOOLS
    Springer
    DOI: 10.1007/978-3-030-92511-6_10
Lachlan J. Gibson1,*, Peter Jacko, Yoni Nazarathy1
  • 1: School of Mathematics and Physics
*Contact email: l.gibson1@uq.edu.au

Abstract

We develop a method for learning index rules for multi-armed bandits, restless bandits, and dynamic resource allocation where the underlying transition probabilities and reward structure of the system is not known. Our approach builds on an understanding of both stochastic optimisation (specifically, the Whittle index) and reinforcement learning (specifically, Q-learning). We propose a novel implementation of Q-learning, which exploits the structure of the problem considered, in which the algorithm maintains two sets of Q-values for each project: one for reward and one for resource consumption. Based on these ideas we design a learning algorithm and illustrate its performance by comparing it to the state-of-the-art Q-learning algorithm for the Whittle index by Avrachenkov and Borkar. Both algorithms rely on Q-learning to estimate the Whittle index policy, however the nature in which Q-learning is used in each algorithm is dramatically different. Our approach seems to be able to deliver similar or better performance and is potentially applicable to a much broader and more general set of problems.

Keywords
Multi-armed bandits Restless bandits Reinforcement learning Q-learning Markov decision processes
Published
2021-12-08
Appears in
SpringerLink
http://dx.doi.org/10.1007/978-3-030-92511-6_10
Copyright © 2021–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL