Total: 1
Traditional reinforcement learning (RL) typically requires vast amounts of trainingdata to develop effective policies. In contrast, large language models (LLMs)exhibit strong generalization and zero-shot capabilities, but struggle with plan-ning and understanding complex action policies. In this work, we introduceSTRATEGIST, a novel approach that integrates the strengths of both methods. Ourapproach leverages LLMs to learn high-level strategic abstractions, which arethen refined and executed by a low-level mechanism, such as Monte Carlo TreeSearch (MCTS). STRATEGIST is a generalizable framework that can be trainedthrough population-based self-play simulations and self-improvement, without theneed for prior training data. We demonstrate the effectiveness of STRATEGIST inlearning optimal policies for competitive, multi-turn games with partial informa-tion, including Game of Pure Strategy (GOPS) and multi-agent, hidden-identitydiscussion games like The Resistance: Avalon. Our results show that agents trainedwith STRATEGIST outperform those trained with traditional RL methods, otherLLM-based skill acquisition techniques, and pre-existing LLM agents across bothgame environments.