Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search

#1 Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search [PDF⁵] [Copy] [Kimi⁶] [REL]

Authors: Jonathan Light, Min Cai, Weiqin Chen, Guanzhi Wang, Xiusi Chen, Wei Cheng, Yisong Yue, Ziniu Hu

Traditional reinforcement learning (RL) typically requires vast amounts of trainingdata to develop effective policies. In contrast, large language models (LLMs)exhibit strong generalization and zero-shot capabilities, but struggle with plan-ning and understanding complex action policies. In this work, we introduceSTRATEGIST, a novel approach that integrates the strengths of both methods. Ourapproach leverages LLMs to learn high-level strategic abstractions, which arethen refined and executed by a low-level mechanism, such as Monte Carlo TreeSearch (MCTS). STRATEGIST is a generalizable framework that can be trainedthrough population-based self-play simulations and self-improvement, without theneed for prior training data. We demonstrate the effectiveness of STRATEGIST inlearning optimal policies for competitive, multi-turn games with partial informa-tion, including Game of Pure Strategy (GOPS) and multi-agent, hidden-identitydiscussion games like The Resistance: Avalon. Our results show that agents trainedwith STRATEGIST outperform those trained with traditional RL methods, otherLLM-based skill acquisition techniques, and pre-existing LLM agents across bothgame environments.

Subject: ICLR.2025 - Poster

gfI9v7AbFg@OpenReview

#1 Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search [PDF5] [Copy] [Kimi6] [REL]

#1 Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search [PDF⁵] [Copy] [Kimi⁶] [REL]