Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search

#1 Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search [PDF⁴] [Copy] [Kimi⁹] [REL]

Authors: Jonathan Light, Min Cai, Weiqin Chen, Guanzhi Wang, Xiusi Chen, Wei Cheng, Yisong Yue, Ziniu Hu

Traditional reinforcement learning and planning typically requires vast amounts of data and training to develop effective policies. In contrast, large language models (LLMs) exhibit strong generalization and zero-shot capabilities, but struggle with tasks that require detailed planning and decision-making in complex action spaces. We introduce STRATEGIST, a novel approach that integrates the strengths of both methods. Our approach leverages LLMs to search and update high-level strategies (as text), which are then refined and executed by low-level Monte Carlo Tree Search (MCTS). STRATEGIST is a generalizable framework to optimize the strategy through population-based self-play simulations without the need for any training data. We demonstrate the effectiveness of STRATEGIST in learning optimal strategies for competitive, multi-turn games with partial information, including Game of Pure Strategy (GOPS) and multi-agent, hidden-identity discussion games like The Resistance: Avalon. Our results show that agents equipped with STRATEGIST outperform those trained with traditional RL methods, other LLM-based skill acquisition techniques, pre-existing LLM agents across both game environments and achieves comparable performance against human players.

Subjects: Artificial Intelligence , Computation and Language

Publish: 2024-08-20 08:22:04 UTC

2408.10635

#1 Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search [PDF4] [Copy] [Kimi9] [REL]

#1 Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search [PDF⁴] [Copy] [Kimi⁹] [REL]