Total: 1
We address differentially private stochastic bandit problems by leveraging Thompson Sampling with Gaussian priors and Gaussian differential privacy (GDP). We propose DP-TS-UCB, a novel parametrized private algorithm that enables trading off privacy and regret. DP-TS-UCB satisfies ˜O(T0.25(1−α))-GDP and achieves O \left(K\ln^{\alpha+1}(T)/\Delta \right) regret bounds, where K is the number of arms, \Delta is the sub-optimality gap, T is the learning horizon, and \alpha \in [0,1] controls the trade-off between privacy and regret. Theoretically, DP-TS-UCB relies on anti-concentration bounds for the Gaussian distributions, linking the exploration mechanisms of Thompson Sampling and Upper Confidence Bound, which may be of independent research interest.