Learning in Markov Games: can we exploit a general-sum opponent?

#1 Learning in Markov Games: can we exploit a general-sum opponent? [PDF] [Copy] [Kimi] [REL]

Authors: Giorgia Ramponi, Marcello Restelli

In this paper, we study the learning problem in two-player general-sum Markov Games. We consider the online setting where we control a single player, playing against an arbitrary opponent to minimize the regret. Previous works only consider the zero-sum Markov Games setting, in which the two agents are completely adversarial. However, in some cases, the two agents may have different reward functions without having conflicting objectives. This involves a stronger notion of regret than the one used in previous works. This class of games, called general-sum Markov Games is far to be well understood and studied. We show that the new regret minimization problem is significantly harder than in standard Markov Decision Processes and zero-sum Markov Games. To do this, we derive a lower bound on the expected regret of any ``good'' learning strategy which shows the constant dependencies with the number of deterministic policies, which is not present in zero-sum Markov Games and Markov Decision Processes. Then we propose a novel optimistic algorithm that nearly matches the proposed lower bound. Proving these results requires overcoming several new challenges that are not present in Markov Decision Processes or zero-sum Markov Games.

Subject: UAI.2022 - Oral

S3NzSD8icx9@OpenReview

#1 Learning in Markov Games: can we exploit a general-sum opponent? [PDF] [Copy] [Kimi] [REL]