Reduction Techniques for Model Checking and Learning in MDPs

#1 Reduction Techniques for Model Checking and Learning in MDPs [PDF] [Copy] [Kimi] [REL]

Authors: Suda Bharadwaj, Stephane Le Roux, Guillermo Perez, Ufuk Topcu

Omega-regular objectives in Markov decision processes (MDPs) reduce to reachability: find a policy which maximizes the probability of reaching a target set of states. Given an MDP, an initial distribution, and a target set of states, such a policy can be computed by most probabilistic model checking tools. If the MDP is only partially specified, i.e., some prob- abilities are unknown, then model-learning techniques can be used to statistically approximate the probabilities and enable the computation of the de- sired policy. For fully specified MDPs, reducing the size of the MDP translates into faster model checking; for partially specified MDPs, into faster learning. We provide reduction techniques that al- low us to remove irrelevant transition probabilities: transition probabilities (known, or to be learned) that do not influence the maximal reachability probability. Among other applications, these reductions can be seen as a pre-processing of MDPs before model checking or as a way to reduce the number of experiments required to obtain a good approximation of an unknown MDP.

Subject: IJCAI.2017 - Planning and Scheduling

597@2017@IJCAI

#1 Reduction Techniques for Model Checking and Learning in MDPs [PDF] [Copy] [Kimi] [REL]