2405.04407

Total: 1

#1 Super-Exponential Regret for UCT, AlphaGo and Variants [PDF] [Copy] [Kimi²] [REL]

We improve the proofs of the lower bounds of Coquelin and Munos (2007) that demonstrate that UCT can have $\exp(\dots\exp(1)\dots)$ regret (with $\Omega(D)$ exp terms) on the $D$ -chain environment, and that a `polynomial' UCT variant has $\exp_2(\exp_2(D - O(\log D)))$ regret on the same environment -- the original proofs contain an oversight for rewards bounded in $[0, 1]$ , which we fix in the present draft. We also adapt the proofs to AlphaGo's MCTS and its descendants (e.g., AlphaZero, Leela Zero) to also show $\exp_2(\exp_2(D - O(\log D)))$ regret.

Subjects: Machine Learning , Artificial Intelligence

Publish: 2024-05-07 15:35:30 UTC

2405.04407

#1 Super-Exponential Regret for UCT, AlphaGo and Variants [PDF] [Copy] [Kimi2] [REL]

#1 Super-Exponential Regret for UCT, AlphaGo and Variants [PDF] [Copy] [Kimi²] [REL]