Information-Computation Tradeoffs for Noiseless Linear Regression with Oblivious Contamination

#1 Information-Computation Tradeoffs for Noiseless Linear Regression with Oblivious Contamination [PDF] [Copy] [Kimi] [REL]

Authors: Ilias Diakonikolas, Chao Gao, Daniel Kane, John Lafferty, Ankit Pensia

We study the task of noiseless linear regression under Gaussian covariates in the presence of additive oblivious contamination. Specifically, we are given i.i.d.\ samples from a distribution $(x, y)$ on $\mathbb R^d \times \mathbb R$ with $x \sim \mathcal N(0,I_d)$ and $y = x^\top \beta + z$, where $z$ is drawn from an unknown distribution that is independent of $x$. Moreover, $z$ satisfies $\mathbb P[z = 0] = \alpha>0$. The goal is to accurately recover the regressor $\beta$ to small $\ell_2$-error. Ignoring computational considerations, this problem is known to be solvable using $O(d/\alpha)$ samples. On the other hand, the best known polynomial-time algorithms require $\Omega(d/\alpha^2)$ samples. Here we provide formal evidence that the quadratic dependence in $1/\alpha$ is inherent for efficient algorithms. Specifically, we show that any efficient Statistical Query algorithm for this task requires VSTAT complexity at least $\tilde{\Omega}(d^{1/2}/\alpha^2)$.

Subject: NeurIPS.2025 - Poster

XqHrG8lBai@OpenReview

#1 Information-Computation Tradeoffs for Noiseless Linear Regression with Oblivious Contamination [PDF] [Copy] [Kimi] [REL]