Characterization and Computation of Restless Bandit Marginal Productivity Indices

José Niño-Mora

1st International ICST Workshop on Tools for solving Structured Markov Chains

Research Article

Characterization and Computation of Restless Bandit Marginal Productivity Indices

Download1168 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.4108/smctools.2007.1993,
    author={Jos\^{e} Ni\`{o}o-Mora},
    title={Characterization and Computation of Restless Bandit Marginal Productivity Indices},
    proceedings={1st International ICST Workshop on Tools for solving Structured Markov Chains},
    proceedings_a={SMCTOOLS},
    year={2010},
    month={5},
    keywords={Markov decision processes restless bandits marginal productivity index index policies block algorithms},
    doi={10.4108/smctools.2007.1993}
}

José Niño-Mora
Year: 2010
Characterization and Computation of Restless Bandit Marginal Productivity Indices
SMCTOOLS
ICST
DOI: 10.4108/smctools.2007.1993

José Niño-Mora¹^,*

1: Department of Statistics Universidad Carlos III de Madrid C/ Madrid 126 28903 Getafe (Madrid), Spain

*Contact email: jnimora@alum.mit.edu

Abstract

The restless bandit problem furnishes a powerful modeling paradigm for settings involving the optimal dynamic priority allocation to multiple stochatic projects, given as binary-action (active/passive) Markov decision processes (MDPs). Though generally intractable, Whittle (1988) introduced a tractable priority-index policy, which has been developed in recent work by the author in an extended framework based on the unifying concept of marginal productivity index (MPI). A growing body of empirical evidence shows MPI policies to be nearly optimal in diverse applications, which motivates the interest to compute efficiently the MPI. For such a purpose, we extend to restless bandits the parametric linear programming (LP) approach deployed in [J. Niño-Mora. A (2/3)n3 fast-pivoting algorithm for the Gittins index and optimal stopping of a Markov chain, INFORMS J. Comp., in press] for classic (nonrestless) bandits. Yet the extension is not straightforward, as the MPI is only defined for the restricted range of so-called indexable bandits, which motivates the quest for methods to establish indexability. This paper furnishes algorithmic and analytical tools to realize the potential of MPI policies in large-scale applications, presenting the following contributions: (i) an algorithmic characterization of indexability, for which two block implementations are given; and (ii) new analytical conditions for indexability --- termed LP-indexability --- that leverage knowledge on the structure of optimal policies, under which the MPI is computed faster by the adaptive-greedy algorithm previously introduced by the author under more stringent (PCL-indexability) conditions, for which a new fast-pivoting block implementation is given. The paper further reports on a computational study, which measures the runtime performance of the algorithms and demonstrates the high prevalence of indexability and PCL-indexability.

Keywords: Markov decision processes, restless bandits, marginal productivity index, index policies, block algorithms

Published: 2010-05-16
Modified: 2011-09-18

: http://dx.doi.org/10.4108/smctools.2007.1993

Characterization and Computation of Restless Bandit Marginal Productivity Indices

Abstract

About EAI

Community

Publish with EAI