Multi-Objective Approaches to Markov Decision Processes with Uncertain Transition Parameters

Dimitri Scheftelowitsch; Peter Buchholz; Vahid Hashemi; Holger Hermanns

11th EAI International Conference on Performance Evaluation Methodologies and Tools

Research Article

Multi-Objective Approaches to Markov Decision Processes with Uncertain Transition Parameters

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.4108/eai.5-12-2017.2274638,
    author={Dimitri  Scheftelowitsch and Peter  Buchholz and Vahid  Hashemi and Holger  Hermanns},
    title={Multi-Objective Approaches to Markov Decision Processes with Uncertain Transition Parameters},
    proceedings={11th EAI International Conference on Performance Evaluation Methodologies and Tools},
    publisher={ACM},
    proceedings_a={VALUETOOLS},
    year={2018},
    month={8},
    keywords={markov decision processes bounded parameter uncertainty multi-objective},
    doi={10.4108/eai.5-12-2017.2274638}
}

Dimitri Scheftelowitsch
Peter Buchholz
Vahid Hashemi
Holger Hermanns
Year: 2018
Multi-Objective Approaches to Markov Decision Processes with Uncertain Transition Parameters
VALUETOOLS
ACM
DOI: 10.4108/eai.5-12-2017.2274638

Dimitri Scheftelowitsch¹^,*, Peter Buchholz¹, Vahid Hashemi², Holger Hermanns²

1: Informatik IV, TU Dortmund
2: Saarland University

*Contact email: dimitri.scheftelowitsch@cs.tu-dortmund.de

Abstract

Markov decision processes (MDPs) are a popular model for performance analysis and optimization of stochastic systems. The parameters of stochastic behavior of MDPs are estimates from empirical observations of a system; their values are not known precisely. Different types of MDPs with uncertain, imprecise or bounded transition rates or probabilities and rewards exist in the literature.

Commonly, analysis of models with uncertainties amounts to searching for the most robust policy which means that the goal is to generate a policy with the greatest lower bound on performance (or, symmetrically, the lowest upper bound on costs). However, hedging against an unlikely worst case may lead to losses in other situations. In general, one is interested in policies that behave well in all situations which results in a multi-objective view on decision making.

In this paper, we consider policies for the expected discounted reward measure of MDPs with uncertain parameters. In particular, the approach is defined for bounded-parameter MDPs (BMDPs). In this setting the worst, best and average case performances of a policy are analyzed simultaneously, which yields a multi-scenario multi-objective optimization problem. The paper presents and evaluates approaches to compute the pure Pareto optimal policies in the value vector space.

Keywords: markov decision processes bounded parameter uncertainty multi-objective

Published: 2018-08-10
Publisher: ACM

: http://dx.doi.org/10.4108/eai.5-12-2017.2274638