Processing math: 16%

EJ2CQwMTci@OpenReview

Total: 1

#1 Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret [PDF] [Copy] [Kimi] [REL]

Authors: Bingshan Hu, Zhiming Huang, Tianyue Zhang, Mathias Lécuyer, Nidhi Hegde

We address differentially private stochastic bandit problems by leveraging Thompson Sampling with Gaussian priors and Gaussian differential privacy (GDP). We propose DP-TS-UCB, a novel parametrized private algorithm that enables trading off privacy and regret. DP-TS-UCB satisfies ˜O(T0.25(1α))-GDP and achieves O \left(K\ln^{\alpha+1}(T)/\Delta \right) regret bounds, where K is the number of arms, \Delta is the sub-optimality gap, T is the learning horizon, and \alpha \in [0,1] controls the trade-off between privacy and regret. Theoretically, DP-TS-UCB relies on anti-concentration bounds for the Gaussian distributions, linking the exploration mechanisms of Thompson Sampling and Upper Confidence Bound, which may be of independent research interest.

Subject: ICML.2025 - Poster