Total: 1
Learning to control an unknown dynamical system with respect to high-level temporal specifications is an important problem in control theory. We present the first regret-free online algorithm for learning a controller for linear temporal logic (LTL) specifications for systems with unknown dynamics.We assume that the underlying (unknown) dynamics is modeled by a finite-state and action Markov decision process (MDPs).Our core technical result is a regret-free learning algorithm for infinite-horizon reach-avoid problems on MDPs.For general LTL specifications, we show that the synthesis problem can be reduced to a reach-avoid problem once the graph structure is known.Additionally, we provide an algorithm for learning the graph structure, assuming knowledge of a minimum transition probability, which operates independently of the main regret-free algorithm. Our LTL controller synthesis algorithm provides sharp bounds on how close we are to achieving optimal behavior after a finite number of learning episodes.In contrast, previous algorithms for LTL synthesis only provide asymptotic guarantees, which give no insight into the transient performance during the learning phase.