Total: 1
Traditional machine learning models can be formulated as the expected risk minimization (ERM) problem: minw∈Rd Eξ [l(w; ξ)], where w ∈ Rd denotes the model parameter, ξ represents training samples, l(·) is the loss function. Numerous optimization algorithms, such as stochastic gradient descent (SGD), have been developed to solve the ERM problem. However, a wide range of emerging machine learning models are beyond this class of optimization problems, such as model-agnostic meta-learning (Finn, Abbeel, and Levine 2017). Of particular interest of my research is the stochastic nested optimization (SNO) problem, whose objective function has a nested structure. Specifically, I have been focusing on two instances of this kind of problem: stochastic compositional optimization (SCO) problems, which cover meta-learning, area-under-the-precision recall-curve optimization, contrastive self-supervised learning, etc., and stochastic bilevel optimization (SBO) problems, which can be applied to meta-learning, hyperparameter optimization, neural network architecture search, etc. With the emergence of large-scale distributed data, such as the user data generated on mobile devices or intelligent hardware, it is imperative to develop distributed optimization algorithms for SNO (Distributed SNO). A significant challenge for optimizing distributed SNO problems lies in that the stochastic (hyper-)gradient is a biased estimation of the full gradient. Thus, existing distributed optimization algorithms when applied to them suffer from slow convergence rates. In this talk, I will discuss my recent works about distributed SCO (Gao and Huang 2021; Gao, Li, and Huang 2022) and distributed SBO (Gao, Gu, and Thai 2022; Gao 2022) under both centralized and decentralized settings, including algorithmic details about reducing the bias of stochastic gradient, theoretical convergence rate, and practical machine learning applications, and then highlight challenges for future research.