COLT.2021 - Award

Total: 7

#1 Non-stationary Reinforcement Learning without Prior Knowledge: an Optimal Black-box Approach [PDF1] [Copy] [Kimi2]

Authors: Chen-Yu Wei ; Haipeng Luo

We propose a black-box reduction that turns a certain reinforcement learning algorithm with optimal regret in a (near-)stationary environment into another algorithm with optimal dynamic regret in a non-stationary environment, importantly without any prior knowledge on the degree of non-stationarity. By plugging different algorithms into our black-box, we provide a list of examples showing that our approach not only recovers recent results for (contextual) multi-armed bandits achieved by very specialized algorithms, but also significantly improves the state of the art for (generalzed) linear bandits, episodic MDPs, and infinite-horizon MDPs in various ways. Specifically, in most cases our algorithm achieves the optimal dynamic regret $\widetilde{\mathcal{O}}(\min\{\sqrt{LT}, \Delta^{\frac{1}{3}}T^{\frac{2}{3}}\})$ where $T$ is the number of rounds and $L$ and $\Delta$ are the number and amount of changes of the world respectively, while previous works only obtain suboptimal bounds and/or require the knowledge of $L$ and $\Delta$.

#2 The Min-Max Complexity of Distributed Stochastic Convex Optimization with Intermittent Communication [PDF1] [Copy] [Kimi1]

Authors: Blake E Woodworth ; Brian Bullins ; Ohad Shamir ; Nathan Srebro

We resolve the min-max complexity of distributed stochastic convex optimization (up to a log factor) in the intermittent communication setting, where $M$ machines work in parallel over the course of $R$ rounds of communication to optimize the objective, and during each round of communication, each machine may sequentially compute $K$ stochastic gradient estimates. We present a novel lower bound with a matching upper bound that establishes an optimal algorithm.

#3 Stochastic block model entropy and broadcasting on trees with survey [PDF] [Copy] [Kimi]

Authors: Emmanuel Abbe ; Elisabetta Cornacchia ; Yuzhou Gu ; Yury Polyanskiy

The limit of the entropy in the stochastic block model (SBM) has been characterized in the sparse regime for the special case of disassortative communities [Coja-Oghlan et al. (2017)] and for the classical case of assortative communities but in the dense regime [Deshpande et al. (2016)]. The problem has not been closed in the classical sparse and assortative case. This paper establishes the result in this case for any SNR besides for the interval (1, 3.513). It further gives an approximation to the limit in this window. The result is obtained by expressing the global SBM entropy as an integral of local tree entropies in a broadcasting on tree model with erasure side-information. The main technical advancement then relies on showing the irrelevance of the boundary in such a model, also studied with variants in [Kanade et al. (2016)], [Mossel et al. (2016)] and [Mossel and Xu (2015)]. In particular, we establish the uniqueness of the BP fixed point in the survey model for any SNR above 3.513 or below 1. This only leaves a narrow region in the plane between SNR and survey strength where the uniqueness of BP conjectured in these papers remains unproved.

#4 Optimal Dynamic Regret in Exp-Concave Online Learning [PDF] [Copy] [Kimi]

Authors: Dheeraj Baby ; Yu-Xiang Wang

We consider the problem of the Zinkevich (2003)-style dynamic regret minimization in online learning with \emph{exp-concave} losses. We show that whenever improper learning is allowed, a Strongly Adaptive online learner achieves the dynamic regret of $\tilde O^*(n^{1/3}C_n^{2/3} \vee 1)$ where $C_n$ is the \emph{total variation} (a.k.a. \emph{path length}) of the an arbitrary sequence of comparators that may not be known to the learner ahead of time. Achieving this rate was highly nontrivial even for square losses in 1D where the best known upper bound was $O(\sqrt{nC_n} \vee \log n)$ (Yuan and Lamperski, 2019). Our new proof techniques make elegant use of the intricate structures of the primal and dual variables imposed by the KKT conditions and could be of independent interest. Finally, we apply our results to the classical statistical problem of \emph{locally adaptive non-parametric regression} (Mammen, 1991; Donoho and Johnstone, 1998) and obtain a stronger and more flexible algorithm that do not require any statistical assumptions or any hyperparameter tuning.

#5 Online Learning with Simple Predictors and a Combinatorial Characterization of Minimax in 0/1 Games [PDF] [Copy] [Kimi]

Authors: Steve Hanneke ; Roi Livni ; Shay Moran

Which classes can be learned properly in the online model? — that is, by an algorithm that on each round uses a predictor from the concept class. While there are simple and natural cases where improper learning is useful and even necessary, it is natural to ask how complex must the improper predictors be in such cases. Can one always achieve nearly optimal mistake/regret bounds using "simple" predictors? In this work, we give a complete characterization of when this is possible, thus settling an open problem which has been studied since the pioneering works of Angluin (1987) and Littlestone (1988). More precisely, given any concept class C and any hypothesis class H, we provide nearly tight bounds (up to a log factor) on the optimal mistake bounds for online learning C using predictors from H. Our bound yields an exponential improvement over the previously best known bound by Chase and Freitag (2020). As applications, we give constructive proofs showing that (i) in the realizable setting, a near-optimal mistake bound (up to a constant factor) can be attained by a sparse majority-vote of proper predictors, and (ii) in the agnostic setting, a near-optimal regret bound (up to a log factor) can be attained by a randomized proper algorithm. The latter was proven non-constructively by Rakhlin, Sridharan, and Tewari (2015). It was also achieved by constructive but improper algorithms proposed by Ben-David, Pal, and Shalev-Shwartz (2009) and Rakhlin, Shamir, and Sridharan (2012). A technical ingredient of our proof which may be of independent interest is a generalization of the celebrated Minimax Theorem (von Neumann, 1928) for binary zero-sum games with arbitrary action-sets: a simple game which fails to satisfy Minimax is "Guess the Larger Number". In this game, each player picks a natural number and the player who picked the larger number wins. Equivalently, the payoff matrix of this game is infinite triangular. We show that this is the only obstruction: if the payoff matrix does not contain triangular submatrices of unbounded sizes then the Minimax theorem is satisfied. This generalizes von Neumann’s Minimax Theorem by removing requirements of finiteness (or compactness) of the action-sets, and moreover it captures precisely the types of games of interest in online learning: namely, Littlestone games.

#6 Statistical Query Algorithms and Low Degree Tests Are Almost Equivalent [PDF] [Copy] [Kimi]

Authors: Matthew S Brennan ; Guy Bresler ; Sam Hopkins ; Jerry Li ; Tselil Schramm

Researchers currently use a number of approaches to predict and substantiate information-computation gaps in high-dimensional statistical estimation problems. A prominent approach is to characterize the limits of restricted models of computation, which on the one hand yields strong computational lower bounds for powerful classes of algorithms and on the other hand helps guide the development of efficient algorithms. In this paper, we study two of the most popular restricted computational models, the statistical query framework and low-degree polynomials, in the context of high-dimensional hypothesis testing. Our main result is that under mild conditions on the testing problem, the two classes of algorithms are essentially equivalent in power. As corollaries, we obtain new statistical query lower bounds for sparse PCA, tensor PCA and several variants of the planted clique problem.

#7 Improved Regret for Zeroth-Order Stochastic Convex Bandits [PDF] [Copy] [Kimi1]

Authors: Tor Lattimore ; Andras Gyorgy

We present an efficient algorithm for stochastic bandit convex optimisation with no assumptions on smoothness or strong convexity and for which the regret is bounded by O(d^(4.5) sqrt(n) polylog(n)), where n is the number of interactions and d is the dimension.