Deployment Efficient Reward-Free Exploration with Linear Function Approximation

#1 Deployment Efficient Reward-Free Exploration with Linear Function Approximation [PDF¹] [Copy] [Kimi] [REL]

Authors: Zihan Zhang, Yuxin Chen, Jason D. Lee, Simon Shaolei Du, Lin Yang, Ruosong Wang

We study deployment-efficient reward-free exploration with linear function approximation, where the goal is to explore a linear Markov Decision Process (MDP) without revealing the reward function, while minimizing the number of distinct policies implemented during learning. By ``deployment efficient'', we mean algorithms that require few policies deployed during exploration -- crucial in real-world applications where such deployments are costly or disruptive. We design a novel reinforcement learning algorithm that achieves near-optimal deployment efficiency for linear MDPs in the reward-free setting, using at most $H$ exploration policies during execution (where $H$ is the horizon length), while maintaining sample complexity polynomial in feature dimension and horizon length. Unlike previous approaches with similar deployment efficiency guarantees, our algorithm's sample complexity is independent of the reachability or explorability coefficients of the underlying MDP, which can be arbitrarily small and lead to unbounded sample complexity in certain cases -- directly addressing an open problem from prior work. Our technical contributions include a data-dependent method for truncating state-action pairs in linear MDPs, efficient offline policy evaluation and optimization algorithms for these truncated MDPs, and a careful integration of these components to implement reward-free exploration with linear function approximation without sacrificing deployment efficiency.

Subject: NeurIPS.2025 - Poster

ByzRO25Bjr@OpenReview

#1 Deployment Efficient Reward-Free Exploration with Linear Function Approximation [PDF1] [Copy] [Kimi] [REL]

#1 Deployment Efficient Reward-Free Exploration with Linear Function Approximation [PDF¹] [Copy] [Kimi] [REL]