Catching Two Birds with One Stone: Reward Shaping with Dual Random Networks for Balancing Exploration and Exploitation

#1 Catching Two Birds with One Stone: Reward Shaping with Dual Random Networks for Balancing Exploration and Exploitation [PDF] [Copy] [Kimi] [REL]

Authors: Haozhe Ma, Fangling Li, Jing Lim, Zhengding Luo, Thanh Vinh Vo, Tze-Yun Leong

Existing reward shaping techniques for sparse-reward reinforcement learning generally fall into two categories: novelty-based exploration bonuses and significance-based hidden state values. The former promotes exploration but can lead to distraction from task objectives, while the latter facilitates stable convergence but often lacks sufficient early exploration. To address these limitations, we propose Dual Random Networks Distillation (DuRND), a novel reward shaping framework that efficiently balances exploration and exploitation in a unified mechanism. DuRND leverages two lightweight random network modules to simultaneously compute two complementary rewards: a novelty reward to encourage directed exploration and a contribution reward to assess progress toward task completion. With low computational overhead, DuRND excels in high-dimensional environments with challenging sparse rewards, such as Atari, VizDoom, and MiniWorld, outperforming several benchmarks.

Subject: ICML.2025 - Poster

YqtgKdW9dD@OpenReview

#1 Catching Two Birds with One Stone: Reward Shaping with Dual Random Networks for Balancing Exploration and Exploitation [PDF] [Copy] [Kimi] [REL]