Statistics Theory

2026-02-17 | | Total: 16

#1 Topological trivialization in non-convex empirical risk minimization [PDF] [Copy] [Kimi] [REL]

Authors: Andrea Montanari, Basil Saeed

Given data $\{({\boldsymbol x}_i,y_i): i\le n\}$, with ${\boldsymbol x}_i$ standard $d$-dimensional Gaussian feature vectors, and $y_i\in{\mathbb R}$ response variables, we study the general problem of learning a model parametrized by ${\boldsymbol θ}\in{\mathbb R}^d$, by minimizing a loss function that depends on ${\boldsymbol θ}$ via the one-dimensional projections ${\boldsymbol θ}^{\sf T}{\boldsymbol x}_i$. While previous work mostly dealt with convex losses, our approach assumes general (non-convex) losses hence covering classical, yet poorly understood examples such as the perceptron and non-convex robust regression. We use the Kac-Rice formula to control the asymptotics of the expected number of local minima of the empirical risk, under the proportional asymptotics $n,d\to\infty$, $n/d\toα>1$. Specifically, we prove a finite dimensional variational formula for the exponential growth rate of the expected number of local minima. Further we provide sufficient conditions under which the exponential growth rate vanishes and all empirical risk minimizers have the same asymptotic properties (in fact, we expect the minimizer to be unique in these circumstances). We refer to this phenomenon as `rate trivialization.' If the population risk has a unique minimizer, our sufficient condition for rate trivialization is typically verified when the samples/parameters ratio $α$ is larger than a suitable constant $α_{\star}$. Previous general results of this type required $n\ge Cd \log d$. We illustrate our results in the case of non-convex robust regression. Based on heuristic arguments and numerical simulations, we present a conjecture for the exact location of the trivialization phase transition $α_{\text{tr}}$.

Subject: Statistics Theory

Publish: 2026-02-16 17:55:50 UTC


#2 Bias analysis of a linear order-statistic inequality index estimator: Unbiasedness under gamma populations [PDF] [Copy] [Kimi] [REL]

Authors: Roberto Vila, Helton Saulo

This paper studies a class of rank-based inequality measures built from linear combinations of expected order statistics. The proposed framework unifies several well-known indices, including the classical Gini coefficient, the $m$th Gini index, extended $m$th Gini index and $S$-Gini index, and also connects to spectral inequality measures through an integral representation. We investigate the finite-sample behavior of a natural U-statistic-type estimator that averages weighted order-statistic contrasts over all subsamples of fixed size and normalizes by the sample mean. A general bias decomposition is derived in terms of components that isolate the effect of random normalization on each rank level, yielding analytical expressions that can be evaluated under broad non-negative distributions via Laplace-transform methods. Under mild moment conditions, the estimator is shown to be asymptotically unbiased. Moreover, we prove exact unbiasedness under gamma populations for any sample size, extending earlier unbiasedness results for Gini-type estimators. A Monte Carlo study is performed to numerically check that the theoretical unbiasednes under gamma populations.

Subjects: Statistics Theory , Methodology

Publish: 2026-02-16 15:53:46 UTC


#3 Frequentist Regret Analysis of Gaussian Process Thompson Sampling via Fractional Posteriors [PDF] [Copy] [Kimi] [REL]

Authors: Somjit Roy, Prateek Jaiswal, Anirban Bhattacharya, Debdeep Pati, Bani K. Mallick

We study Gaussian Process Thompson Sampling (GP-TS) for sequential decision-making over compact, continuous action spaces and provide a frequentist regret analysis based on fractional Gaussian process posteriors, without relying on domain discretization as in prior work. We show that the variance inflation commonly assumed in existing analyses of GP-TS can be interpreted as Thompson Sampling with respect to a fractional posterior with tempering parameter $α\in (0,1)$. We derive a kernel-agnostic regret bound expressed in terms of the information gain parameter $γ_t$ and the posterior contraction rate $ε_t$, and identify conditions on the Gaussian process prior under which $ε_t$ can be controlled. As special cases of our general bound, we recover regret of order $\tilde{\mathcal{O}}(T^{\frac{1}{2}})$ for the squared exponential kernel, $\tilde{\mathcal{O}}(T^{\frac{2ν+3d}{2(2ν+d)}} )$ for the Matérn-$ν$ kernel, and a bound of order $\tilde{\mathcal{O}}(T^{\frac{2ν+3d}{2(2ν+d)}})$ for the rational quadratic kernel. Overall, our analysis provides a unified and discretization-free regret framework for GP-TS that applies broadly across kernel classes.

Subjects: Statistics Theory , Machine Learning , Optimization and Control , Machine Learning

Publish: 2026-02-16 05:18:13 UTC


#4 High-accuracy log-concave sampling with stochastic queries [PDF] [Copy] [Kimi] [REL]

Authors: Fan Chen, Sinho Chewi, Constantinos Daskalakis, Alexander Rakhlin

We show that high-accuracy guarantees for log-concave sampling -- that is, iteration and query complexities which scale as $\mathrm{poly}\log(1/δ)$, where $δ$ is the desired target accuracy -- are achievable using stochastic gradients with subexponential tails. Notably, this exhibits a separation with the problem of convex optimization, where stochasticity (even additive Gaussian noise) in the gradient oracle incurs $\mathrm{poly}(1/δ)$ queries. We also give an information-theoretic argument that light-tailed stochastic gradients are necessary for high accuracy: for example, in the bounded variance case, we show that the minimax-optimal query complexity scales as $Θ(1/δ)$. Our framework also provides similar high accuracy guarantees under stochastic zeroth order (value) queries.

Subjects: Statistics Theory , Data Structures and Algorithms , Machine Learning , Probability

Publish: 2026-02-15 23:19:07 UTC


#5 Kernel Estimation Of Chatterjee's Dependence Coefficient [PDF] [Copy] [Kimi] [REL]

Authors: Mona Azadkia, Holger Dette

Dette, Siburg, and Stoimenov (2013) introduced a copula-based measure of dependence, which implies independence if it vanishes and is equal to 1 if one variable is a measurable function of the other. For continuous distributions, the dependence measure also appears as stochastic limit of Chatterjee's rank correlation (Chatterjee, 2021). They proved asymptotic normality of a corresponding kernel estimator with a parametric rate of convergence. In recent work Shi, Drton, and Han (2022) revealed empirically and theoretically that under independence the asymptotic variance degenerates. In this note, we derive the correct asymptotic distribution of the kernel estimator under the null hypothesis of independence. We show that after a suitable centering and rescaling at a rate larger than $\sqrt{n}$ (where $n$ is the sample size), the estimator is asymptotically normal. The analysis relies on a refined central limit theorem for double-indexed linear permutation statistics and accounts for boundary effects that are asymptotically non-negligible. As a consequence, we obtain a valid basis for independence testing without relying on permutations and argue that tests based on the kernel estimator detect local alternatives converging to the null at a faster rate than those detectable by Chatterjee's rank correlation.

Subject: Statistics Theory

Publish: 2026-02-15 16:03:25 UTC


#6 Mean-Square Convergence of a New Parameterized Leapfrog Scheme for Hamiltonian Systems Driven by Gaussian Process Potentials [PDF] [Copy] [Kimi] [REL]

Author: Sourabh Bhattacharya

This paper establishes the mean-square convergence of a new stochastic, parameterized leapfrog scheme introduced in our companion paper Mazumder et al. (2026) for Hamiltonian systems with Gaussian process potentials. We consider a one-step numerical integrator and provide a complete, rigorous analysis under minimal regularity assumptions on the Gaussian potential. The key technical contribution is identifying and exploiting the symplectic structure ingrained in our stochastic, parameterized leapfrog method. Combined with local truncation error analysis, this leads to a global error bound of O(δt) in mean-square sense. Our results establish that although the spatio-temporal model of Mazumder et al. (2026) arises as the anticipated new stochastic leapfrog solution of a system of modified (parameterized) stochastic Hamiltonian equations, the new stochastic leapfrog actually solves the traditional stochastic Hamiltonian equations, driven by Gaussian process potential.

Subjects: Statistics Theory , Numerical Analysis

Publish: 2026-02-15 08:46:12 UTC


#7 Ensemble-Conditional Gaussian Processes (Ens-CGP): Representation, Geometry, and Inference [PDF] [Copy] [Kimi] [REL]

Authors: Sai Ravela, Jae Deok Kim, Kenneth Gee, Xingjian Yan, Samson Mercier, Lubna Albarghouty, Anamitra Saha

We formulate Ensemble-Conditional Gaussian Processes (Ens-CGP), a finite-dimensional synthesis that centers ensemble-based inference on the conditional Gaussian law. Conditional Gaussian processes (CGP) arise directly from Gaussian processes under conditioning and, in linear-Gaussian settings, define the full posterior distribution for a Gaussian prior and linear observations. Classical Kalman filtering is a recursive algorithm that computes this same conditional law under dynamical assumptions; the conditional Gaussian law itself is therefore the underlying representational object, while the filter is one computational realization. In this sense, CGP provides the probabilistic foundation for Kalman-type methods as well as equivalent formulations as a strictly convex quadratic program (MAP estimation), RKHS-regularized regression, and classical regularization. Ens-CGP is the ensemble instantiation of this object, obtained by treating empirical ensemble moments as a (possibly low-rank) Gaussian prior and performing exact conditioning. By separating representation (GP -> CGP -> Ens-CGP) from computation (Kalman filters, EnKF variants, and iterative ensemble schemes), the framework links an earlier-established representational foundation for inference to ensemble-derived priors and clarifies the relationships among probabilistic, variational, and ensemble perspectives.

Subjects: Statistics Theory , Information Theory , Machine Learning , Optimization and Control , Applications , Machine Learning

Publish: 2026-02-14 20:00:43 UTC


#8 Semi-supervised linear regression with missing covariates [PDF] [Copy] [Kimi] [REL]

Authors: Benedict M. Risebrow, Thomas B. Berrett

Missing values in datasets are common in applied statistics. For regression problems, theoretical work thus far has largely considered the issue of missing covariates as distinct from missing responses. However, in practice, many datasets have both forms of missingness. Motivated by this gap, we study linear regression with a labelled dataset containing missing covariates, potentially alongside an unlabelled dataset. We consider both structured (blockwise-missing) and unstructured missingness patterns, along with sparse and non-sparse regression parameters. For the non-sparse case, we provide an estimator based on imputing the missing data combined with a reweighting step. For the high-dimensional sparse case, we use a modified version of the Dantzig selector. We provide non-asymptotic upper bounds on the risk of both procedures. These are matched by several new minimax lower bounds, demonstrating the rate optimality of our estimators. Notably, even when the linear model is well-specified, our results characterise substantial differences in the minimax rates when unlabelled data is present relative to the fully supervised setting. Particular consequences of our sparse and non-sparse results include the first matching upper and lower bounds on the minimax rate for the supervised setting when either unstructured or structured missingness is present. Our theory is coupled with extensive simulations and a semi-synthetic application to the California housing dataset.

Subjects: Statistics Theory , Methodology

Publish: 2026-02-14 11:41:48 UTC


#9 An Improved Milstein Method for the Numerical Solution of Multidimensional Stochastic Differential Equations [PDF] [Copy] [Kimi] [REL]

Authors: Paromita Banerjee, Anirban Mondal

Stochastic differential equations (SDEs) offer powerful and accessible mathematical models for capturing both deterministic and probabilistic aspects of dynamic behavior across a wide range of physical, financial, and social systems. However, analytical solutions for many SDEs are often unavailable, necessitating the use of numerical approximation methods. The rate of convergence of such numerical methods is of great importance, as it directly influences both computational efficiency and accuracy. This paper presents a proposed theorem, along with its proof, that facilitates the numerical evaluation of the strong (and weak) order of convergence of a numerical scheme for an SDE when the analytical solution is unavailable. Additionally, we address the challenge of numerically computing the multiple stochastic integrals required by the Milstein method to achieve improved convergence rates for multidimensional SDEs. In this context, two newly proposed numerical techniques for computing these multiple stochastic integrals are introduced and compared with existing approaches in terms of efficiency and effectiveness. The methodologies are further illustrated through simulation studies and applications to widely used financial models.

Subjects: Statistics Theory , Numerical Analysis , Probability , Other Statistics

Publish: 2026-02-14 02:54:38 UTC


#10 Intrinsic dimension concentration inequalities for self-adjoint operators [PDF] [Copy] [Kimi] [REL]

Authors: Diego Martinez-Taboada, Aaditya Ramdas

We derive novel concentration inequalities for the operator norm of the sum of self-adjoint operators that do not explicitly depend on the underlying dimension of the operator, but rather an intrinsic notion of it. Our analysis leads to tighter results (in terms of constants) and simplified proofs. Our results unify the current intrinsic-dimension and ambient-dimension inequalities under independence, strictly improving both categories of bounds (such as by Tropp and Minsker). We present a general master theorem that we instantiate to obtain specific sub-Gaussian, Hoeffding, Bernstein, Bennett, and sub-exponential type inequalities. We also establish widely applicable concentration bounds under martingale dependence that provide tighter control than existing results.

Subject: Statistics Theory

Publish: 2026-02-13 21:15:09 UTC


#11 Efficient Sampling with Discrete Diffusion Models: Sharp and Adaptive Guarantees [PDF] [Copy] [Kimi] [REL]

Authors: Daniil Dmitriev, Zhihan Huang, Yuting Wei

Diffusion models over discrete spaces have recently shown striking empirical success, yet their theoretical foundations remain incomplete. In this paper, we study the sampling efficiency of score-based discrete diffusion models under a continuous-time Markov chain (CTMC) formulation, with a focus on $τ$-leaping-based samplers. We establish sharp convergence guarantees for attaining $\varepsilon$ accuracy in Kullback-Leibler (KL) divergence for both uniform and masking noising processes. For uniform discrete diffusion, we show that the $τ$-leaping algorithm achieves an iteration complexity of order $\tilde O(d/\varepsilon)$, with $d$ the ambient dimension of the target distribution, eliminating linear dependence on the vocabulary size $S$ and improving existing bounds by a factor of $d$; moreover, we establish a matching algorithmic lower bound showing that linear dependence on the ambient dimension is unavoidable in general. For masking discrete diffusion, we introduce a modified $τ$-leaping sampler whose convergence rate is governed by an intrinsic information-theoretic quantity, termed the effective total correlation, which is bounded by $d \log S$ but can be sublinear or even constant for structured data. As a consequence, the sampler provably adapts to low-dimensional structure without prior knowledge or algorithmic modification, yielding sublinear convergence rates for various practical examples (such as hidden Markov models, image data, and random graphs). Our analysis requires no boundedness or smoothness assumptions on the score estimator beyond control of the score entropy loss.

Subjects: Machine Learning , Information Theory , Statistics Theory , Machine Learning

Publish: 2026-02-16 18:48:17 UTC


#12 Random geometric graphs with smooth kernels: sharp detection threshold and a spectral conjecture [PDF] [Copy] [Kimi] [REL]

Authors: Cheng Mao, Yihong Wu, Jiaming Xu

A random geometric graph (RGG) with kernel $K$ is constructed by first sampling latent points $x_1,\ldots,x_n$ independently and uniformly from the $d$-dimensional unit sphere, then connecting each pair $(i,j)$ with probability $K(\langle x_i,x_j\rangle)$. We study the sharp detection threshold, namely the highest dimension at which an RGG can be distinguished from its Erdős--Rényi counterpart with the same edge density. For dense graphs, we show that for smooth kernels the critical scaling is $d = n^{3/4}$, substantially lower than the threshold $d = n^3$ known for the hard RGG with step-function kernels \cite{bubeck2016testing}. We further extend our results to kernels whose signal-to-noise ratio scales with $n$, and formulate a unifying conjecture that the critical dimension is determined by $n^3 \mathop{\rm tr}^2(κ^3) = 1$, where $κ$ is the standardized kernel operator on the sphere. Departing from the prevailing approach of bounding the Kullback-Leibler divergence by successively exposing latent points, which breaks down in the sublinear regime of $d=o(n)$, our key technical contribution is a careful analysis of the posterior distribution of the latent points given the observed graph, in particular, the overlap between two independent posterior samples. As a by-product, we establish that $d=\sqrt{n}$ is the critical dimension for non-trivial estimation of the latent vectors up to a global rotation.

Subjects: Probability , Information Theory , Social and Information Networks , Statistics Theory

Publish: 2026-02-16 18:28:44 UTC


#13 Block Empirical Likelihood Inference for Longitudinal Generalized Partially Linear Single-Index Models [PDF] [Copy] [Kimi] [REL]

Authors: Tianni Zhang, Yuyao Wang, Yu Lu, and Mengfei Ran

Generalized partially linear single-index models (GPLSIMs) provide a flexible and interpretable semiparametric framework for longitudinal outcomes by combining a low-dimensional parametric component with a nonparametric index component. For repeated measurements, valid inference is challenging because within-subject correlation induces nuisance parameters and variance estimation can be unstable in semiparametric settings. We propose a profile estimating-equation approach based on spline approximation of the unknown link function and construct a subject-level block empirical likelihood (BEL) for joint inference on the parametric coefficients and the single-index direction. The resulting BEL ratio statistic enjoys a Wilks-type chi-square limit, yielding likelihood-free confidence regions without explicit sandwich variance estimation. We also discuss practical implementation, including constrained optimization for the index parameter, working-correlation choices, and bootstrap-based confidence bands for the nonparametric component. Simulation studies and an application to the epilepsy longitudinal study illustrate the finite-sample performance.

Subjects: Methodology , Statistics Theory , Computation , Machine Learning

Publish: 2026-02-16 18:03:59 UTC


#14 Decoupled Continuous-Time Reinforcement Learning via Hamiltonian Flow [PDF] [Copy] [Kimi] [REL]

Author: Minh Nguyen

Many real-world control problems, ranging from finance to robotics, evolve in continuous time with non-uniform, event-driven decisions. Standard discrete-time reinforcement learning (RL), based on fixed-step Bellman updates, struggles in this setting: as time gaps shrink, the $Q$-function collapses to the value function $V$, eliminating action ranking. Existing continuous-time methods reintroduce action information via an advantage-rate function $q$. However, they enforce optimality through complicated martingale losses or orthogonality constraints, which are sensitive to the choice of test processes. These approaches entangle $V$ and $q$ into a large, complex optimization problem that is difficult to train reliably. To address these limitations, we propose a novel decoupled continuous-time actor-critic algorithm with alternating updates: $q$ is learned from diffusion generators on $V$, and $V$ is updated via a Hamiltonian-based value flow that remains informative under infinitesimal time steps, where standard max/softmax backups fail. Theoretically, we prove rigorous convergence via new probabilistic arguments, sidestepping the challenge that generator-based Hamiltonians lack Bellman-style contraction under the sup-norm. Empirically, our method outperforms prior continuous-time and leading discrete-time baselines across challenging continuous-control benchmarks and a real-world trading task, achieving 21% profit over a single quarter$-$nearly doubling the second-best method.

Subjects: Machine Learning , Artificial Intelligence , Optimization and Control , Statistics Theory

Publish: 2026-02-16 09:35:25 UTC


#15 Why Self-Training Helps and Hurts: Denoising vs. Signal Forgetting [PDF] [Copy] [Kimi] [REL]

Authors: Mingqi Wu, Archer Y. Yang, Qiang Sun

Iterative self-training (self-distillation) repeatedly refits a model on pseudo-labels generated by its own predictions. We study this procedure in overparameterized linear regression: an initial estimator is trained on noisy labels, and each subsequent iterate is trained on fresh covariates with noiseless pseudo-labels from the previous model. In the high-dimensional regime, we derive deterministic-equivalent recursions for the prediction risk and effective noise across iterations, and prove that the empirical quantities concentrate sharply around these limits. The recursion separates two competing forces: a systematic component that grows with iteration due to progressive signal forgetting, and a stochastic component that decays due to denoising via repeated data-dependent projections. Their interaction yields a $U$-shaped test-risk curve and an optimal early-stopping time. In spiked covariance models, iteration further acts as an iteration-dependent spectral filter that preserves strong eigendirections while suppressing weaker ones, inducing an implicit form of soft feature selection distinct from ridge regression. Finally, we propose an iterated generalized cross-validation criterion and prove its uniform consistency for estimating the risk along the self-training trajectory, enabling fully data-driven selection of the stopping time and regularization. Experiments on synthetic covariances validate the theory and illustrate the predicted denoising-forgetting trade-off.

Subjects: Machine Learning , Machine Learning , Statistics Theory

Publish: 2026-02-15 07:28:12 UTC


#16 Estimation and Inference of the Win Ratio for Two Hierarchical Endpoints Subject to Censoring and Missing Data [PDF] [Copy] [Kimi] [REL]

Authors: Yi Liu, Huiman Barnhart, Sean O'Brien, Yuliya Lokhnygina, Roland A. Matsouaka

The win ratio (WR) is a widely used metric to compare treatments in randomized clinical trials with hierarchically ordered endpoints. Counting-based approaches, such as Pocock's algorithm, are the standard for WR estimation. However, this algorithm treats participants with censored or missing data inadequately, which may lead to biased and inefficient estimates, particularly in the presence of heterogeneous censoring or missing data between treatment groups. Although recent extensions have addressed some of these limitations for hierarchical time-to-event endpoints, no existing methods -- aside from the computationally intensive multiple imputation approach -- can accommodate settings that include non-survival endpoints that are subject to missing data. In this paper, we propose a simple nonparametric maximum likelihood estimator (NPMLE) of WR for two hierarchical endpoints that are subject to censoring and missing data. Our method uses all observed data, avoids strong parametric assumptions, and comes with a closed-form asymptotic variance estimator. We demonstrate its performance using simulation studies and two data examples, based on the HEART-FID and ISCHEMIA trials. The proposed method provides a consistent estimator, improves estimation efficiency, and is robust under non-informative censoring and missing at random (MAR) assumptions, offering a flexible alternative to existing WR estimation methods. A user-friendly R package, WinRS, is available to facilitate implementation.

Subjects: Methodology , Statistics Theory , Applications

Publish: 2026-02-14 00:12:46 UTC