Disentangling Exploration of Large Language Models by Optimal Exploitation

#1 Disentangling Exploration of Large Language Models by Optimal Exploitation [PDF] [Copy] [Kimi⁴] [REL]

Authors: Tim Grams, Patrick Betz, Sascha Marton, Stefan Lüdtke, Christian Bartelt

Exploration is a crucial skill for in-context reinforcement learning in unknown environments. However, it remains unclear if large language models can effectively explore a partially hidden state space. This work isolates exploration as the sole objective, tasking an agent with gathering information that enhances future returns. Within this framework, we argue that measuring agent returns is not sufficient for a fair evaluation. Hence, we decompose missing rewards into their exploration and exploitation components based on the optimal achievable return. Experiments with various models reveal that most struggle to explore the state space, and weak exploration is insufficient. Nevertheless, we found a positive correlation between exploration performance and reasoning capabilities. Our decomposition can provide insights into differences in behaviors driven by prompt engineering, offering a valuable tool for refining performance in exploratory tasks.

Subjects: Machine Learning , Artificial Intelligence , Computation and Language

Publish: 2025-01-15 16:30:29 UTC

2501.08925

#1 Disentangling Exploration of Large Language Models by Optimal Exploitation [PDF] [Copy] [Kimi4] [REL]

#1 Disentangling Exploration of Large Language Models by Optimal Exploitation [PDF] [Copy] [Kimi⁴] [REL]