Pareto Q-Learning with Reward Machines

2606.19134

Total: 1

#1 Pareto Q-Learning with Reward Machines [PDF¹] [Copy] [Kimi¹] [REL]

Authors: Arnaud Lequen, Clément Legrand-Lixon, Léo Saulières

We present Pareto Q-Learning with Reward Machines (PQLRM), a multi-objective reinforcement learning algorithm for tasks whose reward structure is specified by a set of reward machines (RMs). PQLRM combines Pareto Q-Learning (PQL), which maintains sets of vector-valued Q-estimates to approximate the Pareto front, with enhancements from Q-Learning with Reward Machines (QRM), which exploits the factored automaton structure of the reward signal. This yields a multi-policy algorithm that remains sample-efficient under non-Markovian, RM-encoded rewards. Experimental trials show that PQLRM converges faster than a naive PQL baseline applied to the cross-product MDP and can synthesize Pareto-optimal policies that QRM cannot.

Subjects: Machine Learning , Artificial Intelligence

Publish: 2026-06-17 14:44:31 UTC

2606.19134

#1 Pareto Q-Learning with Reward Machines [PDF1] [Copy] [Kimi1] [REL]

#1 Pareto Q-Learning with Reward Machines [PDF¹] [Copy] [Kimi¹] [REL]