Statistics

2026-01-19 | | Total: 34

#1 Optimal transport based theory for latent structured models [PDF1] [Copy] [Kimi] [REL]

Authors: XuanLong Nguyen, Yun Wei

This article is an exposition on some recent theoretical advances in learning latent structured models, with a primary focus on the fundamental roles that optimal transport distances play in the statistical theory. We aim at what may be the most critical and novel ingredient in this theory: the motivation, formulation, derivation and ramification of inverse bounds, a rich collection of structural inequalities for latent structured models which connect the space of distributions of unobserved structures of interest to the space of distributions for observed data. This theory is illustrated on classical mixture models, as well as the more modern hierarchical models that have been developed in Bayesian statistics, machine learning and related fields.

Subject: Statistics Theory

Publish: 2026-01-16 17:47:52 UTC


#2 Smooth SCAD: A Raised Cosine SCAD Type Thresholding Rule for Wavelet Denoising [PDF] [Copy] [Kimi] [REL]

Authors: Radhika Kulkarni, Aluisio Pinheiro, Brani Vidakovic, Abdourrahmane M. Atto

We introduce a smooth variant of the SCAD thresholding rule for wavelet denoising by replacing its piecewise linear transition with a raised cosine. The resulting shrinkage function is odd, continuous on R, and continuously differentiable away from the main threshold, yet retains the hallmark SCAD properties of sparsity for small coefficients and near unbiasedness for large ones. This smoothness places the rule within the continuous thresholding class for which Stein's unbiased risk estimate is valid. As a result, unbiased risk computation, stable data-driven threshold selection, and the asymptotic theory of Kudryavtsev and Shestakov apply. A corresponding nonconvex prior is obtained whose posterior mode coincides with the estimator, yielding a transparent Bayesian interpretation. We give an explicit SURE risk expression, discuss the oracle scale of the optimal threshold, and describe both global and level-dependent adaptive versions. The smooth SCAD rule therefore offers a tractable refinement of SCAD, combining low bias, exact sparsity, and analytical convenience in a single wavelet shrinkage procedure.

Subjects: Computation , Computational Engineering, Finance, and Science , Statistics Theory

Publish: 2026-01-16 17:35:02 UTC


#3 Fisher Scoring for Exact Matérn Covariance Estimation through Stable Smoothness Optimization [PDF] [Copy] [Kimi] [REL]

Authors: Yiping Hong, Sameh Abdulah, Marc G. Genton, Ying Sun

Gaussian Random Fields (GRFs) with Matérn covariance functions have emerged as a powerful framework for modeling spatial processes due to their flexibility in capturing different features of the spatial field. However, the smoothness parameter is challenging to estimate using maximum likelihood estimation (MLE), which involves evaluating the likelihood based on the full covariance matrix of the GRF, due to numerical instability. Moreover, MLE remains computationally prohibitive for large spatial datasets. To address this challenge, we propose the Fisher-BackTracking (Fisher-BT) method, which integrates the Fisher scoring algorithm with a backtracking line search strategy and adopts a series approximation for the modified Bessel function. This method enables an efficient MLE estimation for spatial datasets using the ExaGeoStat high-performance computing framework. Our proposed method not only reduces the number of iterations and accelerates convergence compared to derivative-free optimization methods but also improves the numerical stability of the smoothness parameter estimation. Through simulations and real-data analysis using a soil moisture dataset covering the Mississippi River Basin, we show that the proposed Fisher-BT method achieves accuracy comparable to existing approaches while significantly outperforming derivative-free algorithms such as BOBYQA and Nelder-Mead in terms of computational efficiency and numerical stability.

Subject: Computation

Publish: 2026-01-16 16:56:04 UTC


#4 Stein's method for the matrix normal distribution [PDF] [Copy] [Kimi] [REL]

Authors: Robert E. Gaunt, Frédéric Ouimet, Donald Richards

This work presents the first systematic development of Stein's method for matrix distributions. We establish the basic essential ingredients of Stein's method for matrix normal approximation: we derive a generator-based Stein identity from a matrix Ornstein--Uhlenbeck diffusion with two-sided scales, provide an explicit semigroup representation for the solution of the Stein equation, and obtain regularity estimates for the solution. The new methodology is illustrated with three statistical applications, these being smooth Wasserstein distance bounds to quantify the matrix central limit theorem, a Wasserstein distance bound for the matrix normal approximation of the centered matrix $T$ distribution, and the derivation of Stein's method-of-moments estimators for scale parameters of the matrix normal distribution.

Subjects: Statistics Theory , Probability

Publish: 2026-01-16 16:43:10 UTC


#5 Optimal e-values for testing the mean of a bounded random variable against a composite alternative [PDF] [Copy] [Kimi] [REL]

Authors: Sebastian Arnold, Eugenio Clerico

We derive the unique e-values with optimal (relative) growth rate in the worst case for testing the mean of a bounded random variable, hereby contributing with the first application beyond the assumption of mutually absolutely continuous hypotheses of the (RE)GROW quality criteria for e-values originally proposed by Grünwald et al. (2024). For both criteria, we characterise explicitly the alternatives for which it is most difficult to test against, which also admit a meaningful interpretation. We give two important examples of interest where REGROW provides a powerful quality criterion to choose optimal e-variables whereas GROW leads to trivial solutions.

Subject: Statistics Theory

Publish: 2026-01-16 14:53:15 UTC


#6 Deriving Complete Constraints in Hidden Variable Models [PDF] [Copy] [Kimi] [REL]

Authors: Michael C. Sachs, Erin E. Gabriel, Robin J. Evans, Arvid Sjölander

Hidden variable graphical models can sometimes imply constraints on the observable distribution that are more complex than simple conditional independence relations. These observable constraints can falsify assumptions of the model that would otherwise be untestable due to the unobserved variables and can be used to constrain estimation procedures to improve statistical efficiency. Knowing the complete set of observable constraints is thus ideal, but this can be difficult to determine in many settings. In models with categorical observed variables and a joint distribution that is completely characterized by linear relations to the unobservable response function variables, we develop a systematic method for deriving the complete set of observable constraints. We illustrate the method in several new settings, including ones that imply both inequality and equality constraints.

Subject: Methodology

Publish: 2026-01-16 12:42:49 UTC


#7 Estimation of time series by Maximum Mean Discrepancy [PDF] [Copy] [Kimi] [REL]

Authors: Pierre Alquier, Jean-David Fermanian, Benjamin Poignard

We define two minimum distance estimators for dependent data by minimizing some approximated Maximum Mean Discrepancy distances between the true empirical distribution of observations and their assumed (parametric) model distribution. When the latter one is intractable, it is approximated by simulation, allowing to accommodate most dynamic processes with latent variables. We derive the non-asymptotic and the large sample properties of our estimators in the context of absolutely regular/beta-mixing random elements. Our simulation experiments illustrate the robustness of our procedures to model misspecification, particularly in comparison with alternative standard estimation methods.

Subject: Methodology

Publish: 2026-01-16 12:26:48 UTC


#8 TSQCA: Threshold-Sweep Qualitative Comparative Analysis in R [PDF] [Copy] [Kimi] [REL]

Author: Yuki Toyoda

Qualitative Comparative Analysis (QCA) requires researchers to choose calibration and dichotomization thresholds, and these choices can substantially affect truth tables, minimization, and resulting solution formulas. Despite this dependency, threshold sensitivity is often examined only in an ad hoc manner because repeated analyses are time-intensive and error-prone. We present TSQCA, an R package that automates threshold-sweep analyses by treating thresholds as explicit analytical variables. It provides four sweep functions (otSweep, ctSweepS, ctSweepM, dtSweep) to explore outcome thresholds, single-condition thresholds, multi-condition threshold grids, and joint outcome-condition threshold spaces, respectively. TSQCA integrates with the established CRAN package QCA for truth table construction and Boolean minimization, while returning structured S3 objects with consistent print/summary methods and optional detailed results. The package also supports automated Markdown report generation and configuration-chart output to facilitate reproducible documentation of cross-threshold results.

Subject: Methodology

Publish: 2026-01-16 12:19:01 UTC


#9 Robust $M$-Estimation of Scatter Matrices via Precision Structure Shrinkage [PDF] [Copy] [Kimi] [REL]

Authors: Soma Nikai, Yuichi Goto, Koji Tsukuda

Maronna's and Tyler's $M$-estimators are among the most widely used robust estimators for scatter matrices. However, when the dimension of observations is relatively high, their performance can substantially deteriorate in certain situations, particularly in the presence of clustered outliers. To address this issue, we propose an estimator that shrinks the estimated precision matrix toward the identity matrix. We derive a sufficient condition for its existence, discuss its statistical interpretation, and establish upper and lower bounds for its breakdown point. Numerical experiments confirm robustness of the proposed method.

Subjects: Methodology , Statistics Theory , Computation

Publish: 2026-01-16 08:59:10 UTC


#10 Split-and-Conquer: Distributed Factor Modeling for High-Dimensional Matrix-Variate Time Series [PDF1] [Copy] [Kimi] [REL]

Authors: Hangjin Jiang, Yuzhou Li, Zhaoxing Gao

In this paper, we propose a distributed framework for reducing the dimensionality of high-dimensional, large-scale, heterogeneous matrix-variate time series data using a factor model. The data are first partitioned column-wise (or row-wise) and allocated to node servers, where each node estimates the row (or column) loading matrix via two-dimensional tensor PCA. These local estimates are then transmitted to a central server and aggregated, followed by a final PCA step to obtain the global row (or column) loading matrix estimator. Given the estimated loading matrices, the corresponding factor matrices are subsequently computed. Unlike existing distributed approaches, our framework preserves the latent matrix structure, thereby improving computational efficiency and enhancing information utilization. We also discuss row- and column-wise clustering procedures for settings in which the group memberships are unknown. Furthermore, we extend the analysis to unit-root nonstationary matrix-variate time series. Asymptotic properties of the proposed method are derived for the diverging dimension of the data in each computing unit and the sample size $T$. Simulation results assess the computational efficiency and estimation accuracy of the proposed framework, and real data applications further validate its predictive performance.

Subjects: Machine Learning , Machine Learning

Publish: 2026-01-16 08:42:14 UTC


#11 Sub-Cauchy Sampling: Escaping the Dark Side of the Moon [PDF] [Copy] [Kimi] [REL]

Authors: Sebastiano Grazzi, Sifan Liu, Gareth O. Roberts, Jun Yang

We introduce a Markov chain Monte Carlo algorithm based on Sub-Cauchy Projection, a geometric transformation that generalizes stereographic projection by mapping Euclidean space into a spherical cap of a hyper-sphere, referred to as the complement of the dark side of the moon. We prove that our proposed method is uniformly ergodic for sub-Cauchy targets, namely targets whose tails are at most as heavy as a multidimensional Cauchy distribution, and show empirically its performance for challenging high-dimensional problems. The simplicity and broad applicability of our approach open new opportunities for Bayesian modeling and computation with heavy-tailed distributions in settings where most existing methods are unreliable.

Subjects: Computation , Methodology

Publish: 2026-01-16 08:02:52 UTC


#12 Noise-resilient penalty operators based on statistical differentiation schemes [PDF] [Copy] [Kimi] [REL]

Authors: Marc Vidal, Yves Rosseel

Penalized smoothing is a standard tool in regression analysis. Classical approaches often rely on basis or kernel expansions, which constrain the estimator to a fixed span and impose smoothness assumptions that may be restrictive for discretely observed data. We introduce a class of penalized estimators that operate directly on the data grid, denoising sampled trajectories under minimal smoothness assumptions by penalizing local roughness through statistically calibrated difference operators. Some distributional and asymptotic properties of sample-based contrast statistics associated with the resulting linear smoothers are established under Hellinger differentiability of the model, without requiring Fréchet differentiability in function space. Simulation results confirm that the proposed estimators perform competitively across both smooth and locally irregular settings.

Subject: Statistics Theory

Publish: 2026-01-16 06:56:56 UTC


#13 Contextual Distributionally Robust Optimization with Causal and Continuous Structure: An Interpretable and Tractable Approach [PDF1] [Copy] [Kimi] [REL]

Authors: Fenglin Zhang, Jie Wang

In this paper, we introduce a framework for contextual distributionally robust optimization (DRO) that considers the causal and continuous structure of the underlying distribution by developing interpretable and tractable decision rules that prescribe decisions using covariates. We first introduce the causal Sinkhorn discrepancy (CSD), an entropy-regularized causal Wasserstein distance that encourages continuous transport plans while preserving the causal consistency. We then formulate a contextual DRO model with a CSD-based ambiguity set, termed Causal Sinkhorn DRO (Causal-SDRO), and derive its strong dual reformulation where the worst-case distribution is characterized as a mixture of Gibbs distributions. To solve the corresponding infinite-dimensional policy optimization, we propose the Soft Regression Forest (SRF) decision rule, which approximates optimal policies within arbitrary measurable function spaces. The SRF preserves the interpretability of classical decision trees while being fully parametric, differentiable, and Lipschitz smooth, enabling intrinsic interpretation from both global and local perspectives. To solve the Causal-SDRO with parametric decision rules, we develop an efficient stochastic compositional gradient algorithm that converges to an $\varepsilon$-stationary point at a rate of $O(\varepsilon^{-4})$, matching the convergence rate of standard stochastic gradient descent. Finally, we validate our method through numerical experiments on synthetic and real-world datasets, demonstrating its superior performance and interpretability.

Subjects: Machine Learning , Artificial Intelligence , Machine Learning , Optimization and Control

Publish: 2026-01-16 06:18:22 UTC


#14 Generalized Heterogeneous Functional Model with Applications to Large-scale Mobile Health Data [PDF] [Copy] [Kimi] [REL]

Authors: Xiaojing Sun, Bingxin Zhao, Fei Xue

Physical activity is crucial for human health. With the increasing availability of large-scale mobile health data, strong associations have been found between physical activity and various diseases. However, accurately capturing this complex relationship is challenging, possibly because it varies across different subgroups of subjects, especially in large-scale datasets. To fill this gap, we propose a generalized heterogeneous functional method which simultaneously estimates functional effects and identifies subgroups within the generalized functional regression framework. The proposed method captures subgroup-specific functional relationships between physical activity and diseases, providing a more nuanced understanding of these associations. Additionally, we develop a pre-clustering method that enhances computational efficiency for large-scale data through a finer partition of subjects compared to true subgroups. We further introduce a testing procedure to assess whether the different subgroups exhibit distinct functional effects. In the real data application, we examine the impact of physical activity on the risk of dementia using the UK Biobank dataset, which includes over 96,433 participants. Our proposed method outperforms existing methods in future-day prediction accuracy, identifying three distinct subgroups, with detailed scientific interpretations for each subgroup. We also demonstrate the theoretical consistency of our methods. Codes implementing the proposed method are available at: https://github.com/xiaojing777/GHFM.

Subject: Methodology

Publish: 2026-01-16 05:00:21 UTC


#15 Memorize Early, Then Query: Inlier-Memorization-Guided Active Outlier Detection [PDF1] [Copy] [Kimi] [REL]

Authors: Minseo Kang, Seunghwan Park, Dongha Kim

Outlier detection (OD) aims to identify abnormal instances, known as outliers or anomalies, by learning typical patterns of normal data, or inliers. Performing OD under an unsupervised regime-without any information about anomalous instances in the training data-is challenging. A recently observed phenomenon, known as the inlier-memorization (IM) effect, where deep generative models (DGMs) tend to memorize inlier patterns during early training, provides a promising signal for distinguishing outliers. However, existing unsupervised approaches that rely solely on the IM effect still struggle when inliers and outliers are not well-separated or when outliers form dense clusters. To address these limitations, we incorporate active learning to selectively acquire informative labels, and propose IMBoost, a novel framework that explicitly reinforces the IM effect to improve outlier detection. Our method consists of two stages: 1) a warm-up phase that induces and promotes the IM effect, and 2) a polarization phase in which actively queried samples are used to maximize the discrepancy between inlier and outlier scores. In particular, we propose a novel query strategy and tailored loss function in the polarization phase to effectively identify informative samples and fully leverage the limited labeling budget. We provide a theoretical analysis showing that the IMBoost consistently decreases inlier risk while increasing outlier risk throughout training, thereby amplifying their separation. Extensive experiments on diverse benchmark datasets demonstrate that IMBoost not only significantly outperforms state-of-the-art active OD methods but also requires substantially less computational cost.

Subjects: Machine Learning , Machine Learning

Publish: 2026-01-16 04:55:46 UTC


#16 Analyzing Residential Speeding Using Connected Vehicle Data: A Case Study in Charlottesville, VA Area [PDF] [Copy] [Kimi] [REL]

Authors: Shi Feng, B. Brian Park, Andrew Mondschein

This study uses connected vehicle data to analyze speeding behavior on residential roads. A scalable pipeline processes trajectory data and supplements missing speed limits to generate summaries at OpenStreetMap's way ID level. The findings reveal a highly skewed distribution of both aggressive and reckless speeding. Based on a case study of Charlottesville, VA's connected vehicle data on residential roads, we found that 38% of segments had at least one instance of aggressive speeding, and 20% had at least one instance of reckless speeding. In addition, night time speeding is 27 times more prevalent than day time, and extreme violations on specific road segments highlight how severe the issue can be. Several segments rank among the top 10 for both aggressive and reckless speedings, indicating that there exist high-risk residential roads. These findings support the need for both spatial and behavioral interventions. The analysis provides a rich foundation for policy and planning, offering a valuable complement to traditional enforcement and planning tools. In conclusion, this framework sets the foundation for future applications in traffic safety analytics, demonstrating the growing potential of telematics data to inform safer, more livable communities.

Subject: Applications

Publish: 2026-01-16 03:37:23 UTC


#17 A Note on Harmonic Underspecification in Log-Normal Trigonometric Regression [PDF] [Copy] [Kimi] [REL]

Author: Michael T. Gorczyca

Analysis of biological rhythm data often involves performing least squares trigonometric regression, which models the oscillations of a response over time as a sum of sinusoidal components. When the response is not normally distributed, an investigator will either transform the response before applying least squares trigonometric regression or extend trigonometric regression to a generalized linear model (GLM) framework. In this note, we compare these two approaches when the number of oscillation harmonics is underspecified. We assume data are sampled under an equispaced experimental design and that a log link function would be appropriate for a GLM. We show that when the response follows a generalized gamma distribution, least squares trigonometric regression with a log-transformed response, or log-normal trigonometric regression, produces unbiased parameter estimates for the oscillation harmonics, even when the number of oscillation harmonics is underspecified. In contrast, GLMs require correct specification to produce unbiased parameter estimates. We apply both methods to cortisol level data and find that only log-normal trigonometric regression produces parameter estimates that are invariant to the number of specified oscillation harmonics. Additionally, when a sufficiently large number of oscillation harmonics is specified, both methods produce identical parameter estimates for the oscillation harmonics.

Subject: Applications

Publish: 2026-01-16 00:40:03 UTC


#18 On the use of cross-fitting in causal machine learning with correlated units [PDF] [Copy] [Kimi] [REL]

Authors: Salvador V. Balkus, Hasan Laith, Nima S. Hejazi

In causal machine learning, the fitting and evaluation of nuisance models are typically performed on separate partitions, or folds, of the observed data. This technique, called cross-fitting, eliminates bias introduced by the use of black-box predictive algorithms. When study units may be correlated, such as in spatial, clustered, or time-series data, investigators often design bespoke forms of cross-fitting to minimize correlation between folds. We prove that, perhaps contrary to popular belief, this is typically unnecessary: performing cross-fitting as if study units were independent usually still eliminates key bias terms even when units may be correlated. In simulation experiments with various correlation structures, we show that causal machine learning estimators typically have the same or improved bias and precision under cross-fitting that ignores correlation compared to techniques striving to eliminate correlation between folds.

Subject: Methodology

Publish: 2026-01-15 22:57:16 UTC


#19 Locally sparse varying coefficient mixed model with application to longitudinal microbiome differential abundance [PDF] [Copy] [Kimi] [REL]

Authors: Simon Fontaine, Nisha J. D'Silva, Marcell Costa de Medeiros, Grace Y. Chen, Ji Zhu, Gen Li

Differential abundance (DA) analysis in microbiome studies has recently been used to uncover a plethora of associations between microbial composition and various health conditions. While current approaches to DA typically apply only to cross-sectional data, many studies feature a longitudinal design to better understand the underlying microbial dynamics. To study DA in longitudinal microbial studies, we introduce a novel varying coefficient mixed-effects model with local sparsity. The proposed method can identify time intervals of significant group differences while accounting for temporal dependence. Specifically, we exploit a penalized kernel smoothing approach for parameter estimation and include a random effect to account for serial correlation. In particular, our method operates effectively regardless of whether sampling times are shared across subjects, accommodating irregular sampling and missing observations. Simulation studies demonstrate the necessity of modeling dependence for precise estimation and support recovery. The application of our method to a longitudinal study of mice oral microbiome during cancer development revealed significant scientific insights that were otherwise not discernible through cross-sectional analyses. An R implementation is available at https://github.com/fontaine618/LSVCMM.

Subjects: Methodology , Applications

Publish: 2026-01-15 21:53:38 UTC


#20 Rigidity theory in statistical inference [PDF] [Copy] [Kimi] [REL]

Author: Daniel Irving Bernstein

In this expository article, we summarize what is known about maximum likelihood thresholds of Gaussian models, paying special attention to connections with rigidity theory.

Subjects: Statistics Theory , Combinatorics

Publish: 2026-01-15 21:31:28 UTC


#21 Mass Distribution versus Density Distribution in the Context of Clustering [PDF1] [Copy] [Kimi] [REL]

Authors: Kai Ming Ting, Ye Zhu, Hang Zhang, Tianrun Liang

This paper investigates two fundamental descriptors of data, i.e., density distribution versus mass distribution, in the context of clustering. Density distribution has been the de facto descriptor of data distribution since the introduction of statistics. We show that density distribution has its fundamental limitation -- high-density bias, irrespective of the algorithms used to perform clustering. Existing density-based clustering algorithms have employed different algorithmic means to counter the effect of the high-density bias with some success, but the fundamental limitation of using density distribution remains an obstacle to discovering clusters of arbitrary shapes, sizes and densities. Using the mass distribution as a better foundation, we propose a new algorithm which maximizes the total mass of all clusters, called mass-maximization clustering (MMC). The algorithm can be easily changed to maximize the total density of all clusters in order to examine the fundamental limitation of using density distribution versus mass distribution. The key advantage of the MMC over the density-maximization clustering is that the maximization is conducted without a bias towards dense clusters.

Subjects: Machine Learning , Machine Learning

Publish: 2026-01-14 03:55:35 UTC


#22 Temporal Complexity and Self-Organization in an Exponential Dense Associative Memory Model [PDF] [Copy] [Kimi1] [REL]

Authors: Marco Cafiso, Paolo Paradisi

Dense Associative Memory (DAM) models generalize the classical Hopfield model by incorporating n-body or exponential interactions that greatly enhance storage capacity. While the criticality of DAM models has been largely investigated, mainly within a statistical equilibrium picture, little attention has been devoted to the temporal self-organizing behavior induced by learning. In this work, we investigate the behavior of a stochastic exponential DAM (SEDAM) model through the lens of Temporal Complexity (TC), a framework that characterizes complex systems by intermittent transition events between order and disorder and by scale-free temporal statistics. Transition events associated with birth-death of neural avalanche structures are exploited for the TC analyses and compared with analogous transition events based on coincidence structures. We systematically explore how TC indicators depend on control parameters, i.e., noise intensity and memory load. Our results reveal that the SEDAM model exhibits regimes of complex intermittency characterized by nontrivial temporal correlations and scale-free behavior, indicating the spontaneous emergence of self-organizing dynamics. These regimes emerge in small intervals of noise intensity values, which, in agreement with the extended criticality concept, never shrink to a single critical point. Further, the noise intensity range needed to reach the critical region, where self-organizing behavior emerges, slightly decreases as the memory load increases. This study highlights the relevance of TC as a complementary framework for understanding learning and information processing in artificial and biological neural systems, revealing the link between the memory load and the self-organizing capacity of the network.

Subjects: Adaptation and Self-Organizing Systems , Applied Physics , Computational Physics , Data Analysis, Statistics and Probability , Machine Learning

Publish: 2026-01-16 18:01:14 UTC


#23 When Are Two Scores Better Than One? Investigating Ensembles of Diffusion Models [PDF6] [Copy] [Kimi4] [REL]

Authors: Raphaël Razafindralambo, Rémy Sun, Frédéric Precioso, Damien Garreau, Pierre-Alexandre Mattei

Diffusion models now generate high-quality, diverse samples, with an increasing focus on more powerful models. Although ensembling is a well-known way to improve supervised models, its application to unconditional score-based diffusion models remains largely unexplored. In this work we investigate whether it provides tangible benefits for generative modelling. We find that while ensembling the scores generally improves the score-matching loss and model likelihood, it fails to consistently enhance perceptual quality metrics such as FID on image datasets. We confirm this observation across a breadth of aggregation rules using Deep Ensembles, Monte Carlo Dropout, on CIFAR-10 and FFHQ. We attempt to explain this discrepancy by investigating possible explanations, such as the link between score estimation and image quality. We also look into tabular data through random forests, and find that one aggregation strategy outperforms the others. Finally, we provide theoretical insights into the summing of score models, which shed light not only on ensembling but also on several model composition techniques (e.g. guidance).

Subjects: Machine Learning , Computer Vision and Pattern Recognition , Statistics Theory , Methodology , Machine Learning

Publish: 2026-01-16 17:07:25 UTC


#24 Statistical Robustness of Interval CVaR Based Regression Models under Perturbation and Contamination [PDF] [Copy] [Kimi] [REL]

Authors: Yulei You, Junyi Liu

Robustness under perturbation and contamination is a prominent issue in statistical learning. We address the robust nonlinear regression based on the so-called interval conditional value-at-risk (In-CVaR), which is introduced to enhance robustness by trimming extreme losses. While recent literature shows that the In-CVaR based statistical learning exhibits superior robustness performance than classical robust regression models, its theoretical robustness analysis for nonlinear regression remains largely unexplored. We rigorously quantify robustness under contamination, with a unified study of distributional breakdown point for a broad class of regression models, including linear, piecewise affine and neural network models with $\ell_1$, $\ell_2$ and Huber losses. Moreover, we analyze the qualitative robustness of the In-CVaR based estimator under perturbation. We show that under several minor assumptions, the In-CVaR based estimator is qualitatively robust in terms of the Prokhorov metric if and only if the largest portion of losses is trimmed. Overall, this study analyzes robustness properties of In-CVaR based nonlinear regression models under both perturbation and contamination, which illustrates the advantages of In-CVaR risk measure over conditional value-at-risk and expectation for robust regression in both theory and numerical experiments.

Subjects: Optimization and Control , Machine Learning , Machine Learning

Publish: 2026-01-16 16:41:57 UTC


#25 Likelihood-Based Ergodicity Transformations in Time Series Analysis [PDF] [Copy] [Kimi] [REL]

Author: Anthony Britto

Time series often exhibit non-ergodic behaviour that complicates forecasting and inference. This article proposes a likelihood-based approach for estimating ergodicity transformations that addresses such challenges. The method is broadly compatible with standard models, including Gaussian processes, ARMA, and GARCH. A detailed simulation study using geometric and arithmetic Brownian motion demonstrates the ability of the approach to recover known ergodicity transformations. A further case study on the large macroeconomic database FRED-QD shows that incorporating ergodicity transformations can provide meaningful improvements over conventional transformations or naive specifications in applied work.

Subjects: Econometrics , Methodology

Publish: 2026-01-16 12:30:51 UTC