1st International ICST Workshop on Tools for solving Structured Markov Chains

Research Article

Characterization and Computation of Restless Bandit Marginal Productivity Indices

Download190 downloads
  • @INPROCEEDINGS{10.4108/smctools.2007.1993,
        author={Jos\^{e} Ni\`{o}o-Mora},
        title={Characterization and Computation of Restless Bandit Marginal Productivity Indices},
        proceedings={1st International ICST Workshop on Tools for solving Structured Markov Chains},
        proceedings_a={SMCTOOLS},
        year={2010},
        month={5},
        keywords={Markov decision processes restless bandits marginal productivity index index policies block algorithms},
        doi={10.4108/smctools.2007.1993}
    }
    
  • José Niño-Mora
    Year: 2010
    Characterization and Computation of Restless Bandit Marginal Productivity Indices
    SMCTOOLS
    ICST
    DOI: 10.4108/smctools.2007.1993
José Niño-Mora1,*
  • 1: Department of Statistics Universidad Carlos III de Madrid C/ Madrid 126 28903 Getafe (Madrid), Spain
*Contact email: jnimora@alum.mit.edu

Abstract

The restless bandit problem furnishes a powerful modeling paradigm for settings involving the optimal dynamic priority allocation to multiple stochatic projects, given as binary-action (active/passive) Markov decision processes (MDPs). Though generally intractable, Whittle (1988) introduced a tractable priority-index policy, which has been developed in recent work by the author in an extended framework based on the unifying concept of marginal productivity index (MPI). A growing body of empirical evidence shows MPI policies to be nearly optimal in diverse applications, which motivates the interest to compute efficiently the MPI. For such a purpose, we extend to restless bandits the parametric linear programming (LP) approach deployed in [J. Niño-Mora. A (2/3)n3 fast-pivoting algorithm for the Gittins index and optimal stopping of a Markov chain, INFORMS J. Comp., in press] for classic (nonrestless) bandits. Yet the extension is not straightforward, as the MPI is only defined for the restricted range of so-called indexable bandits, which motivates the quest for methods to establish indexability. This paper furnishes algorithmic and analytical tools to realize the potential of MPI policies in large-scale applications, presenting the following contributions: (i) an algorithmic characterization of indexability, for which two block implementations are given; and (ii) new analytical conditions for indexability --- termed LP-indexability --- that leverage knowledge on the structure of optimal policies, under which the MPI is computed faster by the adaptive-greedy algorithm previously introduced by the author under more stringent (PCL-indexability) conditions, for which a new fast-pivoting block implementation is given. The paper further reports on a computational study, which measures the runtime performance of the algorithms and demonstrates the high prevalence of indexability and PCL-indexability.