Research Article
Characterization and Computation of Restless Bandit Marginal Productivity Indices
@INPROCEEDINGS{10.4108/smctools.2007.1993, author={Jos\^{e} Ni\`{o}o-Mora}, title={Characterization and Computation of Restless Bandit Marginal Productivity Indices}, proceedings={1st International ICST Workshop on Tools for solving Structured Markov Chains}, proceedings_a={SMCTOOLS}, year={2010}, month={5}, keywords={Markov decision processes restless bandits marginal productivity index index policies block algorithms}, doi={10.4108/smctools.2007.1993} }
- José Niño-Mora
Year: 2010
Characterization and Computation of Restless Bandit Marginal Productivity Indices
SMCTOOLS
ICST
DOI: 10.4108/smctools.2007.1993
Abstract
The restless bandit problem furnishes a powerful modeling paradigm for settings involving the optimal dynamic priority allocation to multiple stochatic projects, given as binary-action (active/passive) Markov decision processes (MDPs). Though generally intractable, Whittle (1988) introduced a tractable priority-index policy, which has been developed in recent work by the author in an extended framework based on the unifying concept of marginal productivity index (MPI). A growing body of empirical evidence shows MPI policies to be nearly optimal in diverse applications, which motivates the interest to compute efficiently the MPI. For such a purpose, we extend to restless bandits the parametric linear programming (LP) approach deployed in [J. Niño-Mora. A (2/3)n3 fast-pivoting algorithm for the Gittins index and optimal stopping of a Markov chain, INFORMS J. Comp., in press] for classic (nonrestless) bandits. Yet the extension is not straightforward, as the MPI is only defined for the restricted range of so-called indexable bandits, which motivates the quest for methods to establish indexability. This paper furnishes algorithmic and analytical tools to realize the potential of MPI policies in large-scale applications, presenting the following contributions: (i) an algorithmic characterization of indexability, for which two block implementations are given; and (ii) new analytical conditions for indexability --- termed LP-indexability --- that leverage knowledge on the structure of optimal policies, under which the MPI is computed faster by the adaptive-greedy algorithm previously introduced by the author under more stringent (PCL-indexability) conditions, for which a new fast-pivoting block implementation is given. The paper further reports on a computational study, which measures the runtime performance of the algorithms and demonstrates the high prevalence of indexability and PCL-indexability.