Total: 1
Given society's increasing reliance on data, its collection and processing into useful information is a technical problem of growing focus, and perhaps paradoxically, a critical bottleneck in many data science and machine learning applications. Yet, even for the most basic statistical problems such as mean estimation, there is a theory-practice divide. Conventional methods like the sample mean, while supported by theoretical results under strong assumptions, are often brittle in the presence of extreme data. Practitioners thus often use ad-hoc and unprincipled "outlier removal" heuristics, but which can lead to wrong conclusions (e.g. Milikan's underestimation of the electron charge). In this talk, I will describe my work that essentially resolves the fundamental 1-d mean estimation problem. I will show the construction of a statistically-optimal and computationally-efficient 1-dimensional mean estimator, whose estimation error is optimal even in the leading multiplicative constant, under bare minimum distributional assumptions (FOCS 2021). Furthermore, I will discuss its various robustness properties (ICML 2025 Oral), in particular highlighting robustness to adversarial sample corruption.