Total: 1
Real-world human decision-making often relies on strategic planning, where *high-level* goals guide the formulation of sub-goals and subsequent actions, as evidenced by domains such as healthcare, business, and urban policy. Despite notable successes in controlled settings, conventional reinforcement learning (RL) follows a *bottom-up* paradigm, which can struggle to adapt to real-world complexities such as sparse rewards and limited exploration budgets. While methods like hierarchical RL and environment shaping provide partial solutions, they frequently rely on either ad-hoc designs (e.g. choose the set of high-level actions) or purely data-driven discovery of high-level actions that still requires significant exploration. In this paper, we introduce a *top-down* framework for RL that explicitly leverages *human-like strategy* to reduce sample complexity, guide exploration, and enable high-level decision-making. We first formalize the *Strategy Problem*, which frames policy generation as finding distributions over policies that balance *specificity* and *value*. Building on this definition, we propose the *Strategist* agent—an iterative framework that leverages large language models (LLMs) to synthesize domain knowledge into a structured representation of actionable strategies and sub-goals. We further develop a *reward shaping methodology* that translates these strategies expressed in natural language into quantitative feedback for RL methods. Empirically, we demonstrate a significantly faster convergence than conventional PPO. Taken together, our findings highlight that *top-down strategic exploration* opens new avenues for enhancing RL on real-world decision problems.