Does Stochastic Gradient really succeed for bandits?

#1 Does Stochastic Gradient really succeed for bandits? [PDF⁷] [Copy] [Kimi⁶] [REL]

Authors: Dorian Baudry, Emmeran Johnson, Simon Vary, Ciara Pike-Burke, Patrick Rebeschini

Recent works of Mei et al. (2023, 2024) have deepened the theoretical understanding of the *Stochastic Gradient Bandit* (SGB) policy, showing that using a constant learning rate guarantees asymptotic convergence to the optimal policy, and that sufficiently *small* learning rates can yield logarithmic regret. However, whether logarithmic regret holds beyond small learning rates remains unclear. In this work, we take a step towards characterizing the regret *regimes* of SGB as a function of its learning rate. For two--armed bandits, we identify a sharp threshold, scaling with the sub-optimality gap $\Delta$, below which SGB achieves *logarithmic* regret on all instances, and above which it can incur *polynomial* regret on some instances. This result highlights the necessity of knowing (or estimating) $\Delta$ to ensure logarithmic regret with a constant learning rate. For general $K$-armed bandits, we further show the learning rate must scale inversely with $K$ to avoid polynomial regret. We introduce novel techniques to derive regret upper bounds for SGB, laying the groundwork for future advances in the theory of gradient-based bandit algorithms.

Subject: NeurIPS.2025 - Oral

gL4muAFwsh@OpenReview

#1 Does Stochastic Gradient really succeed for bandits? [PDF7] [Copy] [Kimi6] [REL]

#1 Does Stochastic Gradient really succeed for bandits? [PDF⁷] [Copy] [Kimi⁶] [REL]