Bilevel MCTS for Amortized O(1) Node Selection in Classical Planning

#1 Bilevel MCTS for Amortized O(1) Node Selection in Classical Planning [PDF¹] [Copy] [Kimi] [REL]

We study an efficient implementation of Multi-Armed Bandit (MAB)-based Monte-Carlo Tree Search (MCTS) for classical planning. One weakness of MCTS is that it spends a significant time in deciding which node to expand next. While selecting a node from an OPEN list with N nodes has O(1) runtime complexity with traditional array-based priority-queues for dense integer keys, the tree-based OPEN list used by MCTS requires O(log N), which roughly corresponds to the search depth d. In classical planning, d is arbitrarily large (e.g., 2^k-1 in k-disk Tower-of-Hanoi) and the runtime for node selection is significant, unlike in game tree search, where the cost is negligible compared to the node evaluation (rollouts) because d is inherently limited by the game (e.g. d≦361 in Go). To improve this bottleneck, we propose a bilevel modification to MCTS that runs a best-first search from each selected leaf nodes with an expansion budget proportional to d, which achieves amortized O(1) runtime for node selection, equivalent to traditional queue-based OPEN list. In addition, we introduce Tree Collapsing, an enhancement that reduces action selection steps and further improves the performance.

Subject: AAAI.2026 - Planning, Routing, and Scheduling

40933@AAAI

#1 Bilevel MCTS for Amortized O(1) Node Selection in Classical Planning [PDF1] [Copy] [Kimi] [REL]

#1 Bilevel MCTS for Amortized O(1) Node Selection in Classical Planning [PDF¹] [Copy] [Kimi] [REL]