2405.04407

Total: 1

#1 Super-Exponential Regret for UCT, AlphaGo and Variants [PDF] [Copy] [Kimi2] [REL]

Authors: Laurent Orseau, Remi Munos

We improve the proofs of the lower bounds of Coquelin and Munos (2007) that demonstrate that UCT can have exp(exp(1)) regret (with Ω(D) exp terms) on the D-chain environment, and that a `polynomial' UCT variant has exp2(exp2(DO(logD))) regret on the same environment -- the original proofs contain an oversight for rewards bounded in [0,1], which we fix in the present draft. We also adapt the proofs to AlphaGo's MCTS and its descendants (e.g., AlphaZero, Leela Zero) to also show exp2(exp2(DO(logD))) regret.

Subjects: Machine Learning , Artificial Intelligence

Publish: 2024-05-07 15:35:30 UTC