Total: 1
This paper addresses the Bayesian optimization problem (also referred to as the Bayesian setting of the Gaussian process bandit), where the learner seeks to minimize the regret under a function drawn from a known Gaussian process (GP). Under a Matérn kernel with some extent of smoothness, we show that the Gaussian process upper confidence bound (GP-UCB) algorithm achieves $\tilde{O}(\sqrt{T})$ cumulative regret with high probability. Furthermore, our analysis yields $O(\sqrt{T \ln^2 T})$ regret under a squared exponential kernel. These results fill the gap between the existing regret upper bound of GP-UCB and the current best upper bound provided by Scarlett [2018]. The key idea in our proof is to capture the concentration behavior of the input sequence realized by GP-UCB, enabling us to handle GP's information gain in a refined manner.