Statistics

2025-12-05 | | Total: 26

#1 Foundations of Diffusion Models in General State Spaces: A Self-Contained Introduction [PDF] [Copy] [Kimi] [REL]

Authors: Vincent Pauline, Tobias Höppe, Kirill Neklyudov, Alexander Tong, Stefan Bauer, Andrea Dittadi

Although diffusion models now occupy a central place in generative modeling, introductory treatments commonly assume Euclidean data and seldom clarify their connection to discrete-state analogues. This article is a self-contained primer on diffusion over general state spaces, unifying continuous domains and discrete/categorical structures under one lens. We develop the discrete-time view (forward noising via Markov kernels and learned reverse dynamics) alongside its continuous-time limits -- stochastic differential equations (SDEs) in $\mathbb{R}^d$ and continuous-time Markov chains (CTMCs) on finite alphabets -- and derive the associated Fokker--Planck and master equations. A common variational treatment yields the ELBO that underpins standard training losses. We make explicit how forward corruption choices -- Gaussian processes in continuous spaces and structured categorical transition kernels (uniform, masking/absorbing and more) in discrete spaces -- shape reverse dynamics and the ELBO. The presentation is layered for three audiences: newcomers seeking a self-contained intuitive introduction; diffusion practitioners wanting a global theoretical synthesis; and continuous-diffusion experts looking for an analogy-first path into discrete diffusion. The result is a unified roadmap to modern diffusion methodology across continuous domains and discrete sequences, highlighting a compact set of reusable proofs, identities, and core theoretical principles.

Subjects: Machine Learning , Machine Learning

Publish: 2025-12-04 18:55:36 UTC


#2 Control Consistency Losses for Diffusion Bridges [PDF] [Copy] [Kimi] [REL]

Authors: Samuel Howard, Nikolas Nüsken, Jakiw Pidstrigach

Simulating the conditioned dynamics of diffusion processes, given their initial and terminal states, is an important but challenging problem in the sciences. The difficulty is particularly pronounced for rare events, for which the unconditioned dynamics rarely reach the terminal state. In this work, we leverage a self-consistency property of the conditioned dynamics to learn the diffusion bridge in an iterative online manner, and demonstrate promising empirical results in a range of settings.

Subjects: Machine Learning , Machine Learning

Publish: 2025-12-04 18:31:39 UTC


#3 Model-Free Assessment of Simulator Fidelity via Quantile Curves [PDF] [Copy] [Kimi] [REL]

Authors: Garud Iyengar, Yu-Shiou Willy Lin, Kaizheng Wang

Simulation of complex systems originated in manufacturing and queuing applications. It is now widely used for large-scale, ML-based systems in research, education, and consumer surveys. However, characterizing the discrepancy between simulators and ground truth remains challenging for increasingly complex, machine-learning-based systems. We propose a computationally tractable method to estimate the quantile function of the discrepancy between the simulated and ground-truth outcome distributions. Our approach focuses on output uncertainty and treats the simulator as a black box, imposing no modeling assumptions on its internals, and hence applies broadly across many parameter families, from Bernoulli and multinomial models to continuous, vector-valued settings. The resulting quantile curve supports confidence interval construction for unseen scenarios, risk-aware summaries of sim-to-real discrepancy (e.g., VaR/CVaR), and comparison of simulators' performance. We demonstrate our methodology in an application assessing LLM simulation fidelity on the WorldValueBench dataset spanning four LLMs.

Subjects: Methodology , Artificial Intelligence , Machine Learning

Publish: 2025-12-04 17:39:51 UTC


#4 Towards a unified framework for guided diffusion models [PDF] [Copy] [Kimi] [REL]

Authors: Yuchen Jiao, Yuxin Chen, Gen Li

Guided or controlled data generation with diffusion models\blfootnote{Partial preliminary results of this work appeared in International Conference on Machine Learning 2025 \citep{li2025provable}.} has become a cornerstone of modern generative modeling. Despite substantial advances in diffusion model theory, the theoretical understanding of guided diffusion samplers remains severely limited. We make progress by developing a unified algorithmic and theoretical framework that accommodates both diffusion guidance and reward-guided diffusion. Aimed at fine-tuning diffusion models to improve certain rewards, we propose injecting a reward guidance term -- constructed from the difference between the original and reward-reweighted scores -- into the backward diffusion process, and rigorously quantify the resulting reward improvement over the unguided counterpart. As a key application, our framework shows that classifier-free guidance (CFG) decreases the expected reciprocal of the classifier probability, providing the first theoretical characterization of the specific performance metric that CFG improves for general target distributions. When applied to reward-guided diffusion, our framework yields a new sampler that is easy-to-train and requires no full diffusion trajectories during training. Numerical experiments further corroborate our theoretical findings.

Subjects: Machine Learning , Machine Learning , Statistics Theory

Publish: 2025-12-04 16:55:20 UTC


#5 Learning Causality for Longitudinal Data [PDF] [Copy] [Kimi] [REL]

Author: Mouad EL Bouchattaoui

This thesis develops methods for causal inference and causal representation learning (CRL) in high-dimensional, time-varying data. The first contribution introduces the Causal Dynamic Variational Autoencoder (CDVAE), a model for estimating Individual Treatment Effects (ITEs) by capturing unobserved heterogeneity in treatment response driven by latent risk factors that affect only outcomes. CDVAE comes with theoretical guarantees on valid latent adjustment and generalization bounds for ITE error. Experiments on synthetic and real datasets show that CDVAE outperforms baselines, and that state-of-the-art models greatly improve when augmented with its latent substitutes, approaching oracle performance without access to true adjustment variables. The second contribution proposes an efficient framework for long-term counterfactual regression based on RNNs enhanced with Contrastive Predictive Coding (CPC) and InfoMax. It captures long-range dependencies under time-varying confounding while avoiding the computational cost of transformers, achieving state-of-the-art results and introducing CPC into causal inference. The third contribution advances CRL by addressing how latent causes manifest in observed variables. We introduce a model-agnostic interpretability layer based on the geometry of the decoder Jacobian. A sparse self-expression prior induces modular, possibly overlapping groups of observed features aligned with shared latent influences. We provide recovery guarantees in both disjoint and overlapping settings and show that meaningful latent-to-observed structure can be recovered without anchor features or single-parent assumptions. Scalable Jacobian-based regularization techniques are also developed.

Subjects: Machine Learning , Information Theory , Machine Learning

Publish: 2025-12-04 16:51:49 UTC


#6 Concentration bounds for intrinsic dimension estimation using Gaussian kernels [PDF] [Copy] [Kimi] [REL]

Author: Martin Andersson

We prove finite-sample concentration and anti-concentration bounds for dimension estimation using Gaussian kernel sums. Our bounds provide explicit dependence on sample size, bandwidth, and local geometric and distributional parameters, characterizing precisely how regularity conditions govern statistical performance. We also propose a bandwidth selection heuristic using derivative information, which shows promise in numerical experiments.

Subjects: Statistics Theory , Machine Learning

Publish: 2025-12-04 14:45:08 UTC


#7 Clustering country-level all-cause mortality data: a review [PDF] [Copy] [Kimi] [REL]

Authors: Pedro Menezes de Araujo, Isobel Claire Gormley, Thomas Brendan Murphy

Mortality data are relevant to demography, public health, and actuarial science. Whilst clustering is increasingly used to explore patterns in such data, no study has reviewed its application to country-level all-cause mortality. This review therefore summarises recent work and addresses key questions: why clustering is used, which mortality data are analysed, which methods are most common, and what main findings emerge. To address these questions, we examine studies applying clustering to country-level all-cause mortality, focusing on mortality indices, data sources, and methodological choices, and we replicate some approaches using Human Mortality Database (HMD) data. Our analysis reveals that clustering is mainly motivated by forecasting and by studying convergence and inequality. Most studies use HMD data from developed countries and rely on k-means, hierarchical, or functional clustering. Main findings include a persistent East-West European division across applications, with clustering generally improving forecast accuracy over single-country models. Overall, this review highlights the methodological range in the literature, summarises clustering results, and identifies gaps, such as the limited evaluation of clustering quality and the underuse of data from countries outside the high-income world.

Subject: Applications

Publish: 2025-12-04 14:15:37 UTC


#8 Provable FDR Control for Deep Feature Selection: Deep MLPs and Beyond [PDF1] [Copy] [Kimi] [REL]

Author: Kazuma Sawaya

We develop a flexible feature selection framework based on deep neural networks that approximately controls the false discovery rate (FDR), a measure of Type-I error. The method applies to architectures whose first layer is fully connected. From the second layer onward, it accommodates multilayer perceptrons (MLPs) of arbitrary width and depth, convolutional and recurrent networks, attention mechanisms, residual connections, and dropout. The procedure also accommodates stochastic gradient descent with data-independent initializations and learning rates. To the best of our knowledge, this is the first work to provide a theoretical guarantee of FDR control for feature selection within such a general deep learning setting. Our analysis is built upon a multi-index data-generating model and an asymptotic regime in which the feature dimension $n$ diverges faster than the latent dimension $q^{*}$, while the sample size, the number of training iterations, the network depth, and hidden layer widths are left unrestricted. Under this setting, we show that each coordinate of the gradient-based feature-importance vector admits a marginal normal approximation, thereby supporting the validity of asymptotic FDR control. As a theoretical limitation, we assume $\mathbf{B}$-right orthogonal invariance of the design matrix, and we discuss broader generalizations. We also present numerical experiments that underscore the theoretical findings.

Subjects: Machine Learning , Machine Learning , Statistics Theory

Publish: 2025-12-04 11:46:06 UTC


#9 Recurrent Neural Networks with Linear Structures for Electricity Price Forecasting [PDF1] [Copy] [Kimi] [REL]

Authors: Souhir Ben Amor, Florian Ziel

We present a novel recurrent neural network architecture designed explicitly for day-ahead electricity price forecasting, aimed at improving short-term decision-making and operational management in energy systems. Our combined forecasting model embeds linear structures, such as expert models and Kalman filters, into recurrent networks, enabling efficient computation and enhanced interpretability. The design leverages the strengths of both linear and non-linear model structures, allowing it to capture all relevant stylised price characteristics in power markets, including calendar and autoregressive effects, as well as influences from load, renewable energy, and related fuel and carbon markets. For empirical testing, we use hourly data from the largest European electricity market spanning 2018 to 2025 in a comprehensive forecasting study, comparing our model against state-of-the-art approaches, particularly high-dimensional linear and neural network models. The proposed model achieves approximately 12% higher accuracy than leading benchmarks. We evaluate the contributions of the interpretable model components and conclude on the impact of combining linear and non-linear structures.

Subjects: Machine Learning , Machine Learning

Publish: 2025-12-04 11:38:30 UTC


#10 Tensor Neyman-Pearson Classification: Theory, Algorithms, and Error Control [PDF] [Copy] [Kimi] [REL]

Authors: Lingchong Liu, Elynn Chen, Yuefeng Han, Lucy Xia

Biochemical discovery increasingly relies on classifying molecular structures when the consequences of different errors are highly asymmetric. In mutagenicity and carcinogenicity, misclassifying a harmful compound as benign can trigger substantial scientific, regulatory, and health risks, whereas false alarms primarily increase laboratory workload. Modern representations transform molecular graphs into persistence image tensors that preserve multiscale geometric and topological structure, yet existing tensor classifiers and deep tensor neural networks provide no finite-sample guarantees on type I error and often exhibit severe error inflation in practice. We develop the first Tensor Neyman-Pearson (Tensor-NP) classification framework that achieves finite-sample control of type I error while exploiting the multi-mode structure of tensor data. Under a tensor-normal mixture model, we derive the oracle NP discriminant, characterize its Tucker low-rank manifold geometry, and establish tensor-specific margin and conditional detection conditions enabling high-probability bounds on excess type II error. We further propose a Discriminant Tensor Iterative Projection estimator and a Tensor-NP Neural Classifier combining deep learning with Tensor-NP umbrella calibration, yielding the first distribution-free NP-valid methods for multiway data. Across four biochemical datasets, Tensor-NP classifiers maintain type I errors at prespecified levels while delivering competitive type II error performance, providing reliable tools for asymmetric-risk decisions with complex molecular tensors.

Subject: Methodology

Publish: 2025-12-04 08:54:46 UTC


#11 Bayesian Graphical High-Dimensional Time Series Models for Detecting Structural Changes [PDF] [Copy] [Kimi] [REL]

Authors: Shuvrarghya Ghosh, Arkaprava Roy, Anindya Roy, Subhashis Ghosal

We study the structural changes in multivariate time-series by estimating and comparing stationary graphs for macroeconomic time series before and after an economic crisis such as the Great Recession. Building on a latent time series framework called Orthogonally-rotated Univariate Time-series (OUT), we propose a shared-parameter framework-the spOUT autoregressive model (spOUTAR)-that jointly models two related multivariate time series and enables coherent Bayesian estimation of their corresponding stationary precision matrices. This framework provides a principled mechanism to detect and quantify which conditional relationships among the variables changed, or formed following the crisis. Specifically, we study the impact of the Great Recession (December 2007-June 2009) that substantially disrupted global and national economies, prompting long-lasting shifts in macroeconomic indicators and their interrelationships. While many studies document its economic consequences, far less is known about how the underlying conditional dependency structure among economic variables changed as economies moved from pre-crisis stability through the shock and back to normalcy. Using the proposed approach to analyze U.S. and OECD macroeconomic data, we demonstrate that spOUTAR effectively captures recession-induced changes in stationary graphical structure, offering a flexible and interpretable tool for studying structural shifts in economic systems.

Subjects: Methodology , Applications

Publish: 2025-12-04 04:30:17 UTC


#12 Multi-source Learning for Target Population by High-dimensional Calibration [PDF] [Copy] [Kimi] [REL]

Authors: Haoxiang Zhan, Jae Kwang Kim, Yumou Qiu

Multi-source learning is an emerging area of research in statistics, where information from multiple datasets with heterogeneous distributions is combined to estimate the parameter of interest for a target population without observed responses. We propose a high-dimensional debiased calibration (HDC) method and a multi-source HDC (MHDC) estimator for general estimating equations. The HDC method uses a novel approach to achieve Neyman orthogonality for the target parameter via high-dimensional covariate balancing on an augmented set of covariates. It avoids the augmented inverse probability weighting formulation and leads to an easier optimization algorithm for the target parameter in estimating equations and M-estimation. The proposed MHDC estimator integrates multi-source data while supporting flexible specifications for both density ratio and outcome regression models, achieving multiple robustness against model misspecification. Its asymptotic normality is established, and a specification test is proposed to examine the transferability condition for the multi-source data. Compared to the linear combination of single-source HDC estimators, the MHDC estimator improves efficiency by jointly utilizing all data sources. Through simulation studies, we show that the MHDC estimator accommodates multiple sources and multiple working models effectively and performs better than the existing doubly robust estimators for multi-source learning. An empirical analysis of a meteorological dataset demonstrates the utility of the proposed method in practice.

Subject: Methodology

Publish: 2025-12-04 03:14:52 UTC


#13 Learning Heterogeneous Ordinal Graphical Models via Bayesian Nonparametric Clustering [PDF] [Copy] [Kimi] [REL]

Authors: Wang Wen, Ziqi Chen, Guanyu Hu

Graphical models are powerful tools for capturing conditional dependence structures in complex systems but remain underexplored in analyzing ordinal data, especially in sports analytics. Ordinal variables, such as team rankings, player performance ratings, and survey responses, are pervasive in sports data but present unique challenges, particularly when accounting for heterogeneous subgroups, such as teams with varying styles or players with distinct roles. Existing methods, including probit graphical models, struggle with modeling heterogeneity and selecting the number of subgroups effectively. We propose a novel nonparametric Bayesian framework using the Mixture of Finite Mixtures (MFM) approach to address these challenges. Our method allows for flexible subgroup discovery and models each subgroup with a probit graphical model, simultaneously estimating the number of clusters and their configurations. We develop an efficient Gibbs sampling algorithm for inference, enabling robust estimation of cluster-specific structures and parameters. This framework is particularly suited to sports analytics, uncovering latent patterns in player performance metrics. Our work bridges critical gaps in modeling ordinal data and provides a foundation for advanced decision-making in sports performance and strategy.

Subject: Methodology

Publish: 2025-12-04 03:10:38 UTC


#14 Informative missingness and its implications in semi-supervised learning [PDF1] [Copy] [Kimi] [REL]

Authors: Jinran Wu, You-Gan Wang, Geoffrey J. McLachlan

Semi-supervised learning (SSL) constructs classifiers using both labelled and unlabelled data. It leverages information from labelled samples, whose acquisition is often costly or labour-intensive, together with unlabelled data to enhance prediction performance. This defines an incomplete-data problem, which statistically can be formulated within the likelihood framework for finite mixture models that can be fitted using the expectation-maximisation (EM) algorithm. Ideally, one would prefer a completely labelled sample, as one would anticipate that a labelled observation provides more information than an unlabelled one. However, when the mechanism governing label absence depends on the observed features or the class labels or both, the missingness indicators themselves contain useful information. In certain situations, the information gained from modelling the missing-label mechanism can even outweigh the loss due to missing labels, yielding a classifier with a smaller expected error than one based on a completely labelled sample analysed. This improvement arises particularly when class overlap is moderate, labelled data are sparse, and the missingness is informative. Modelling such informative missingness thus offers a coherent statistical framework that unifies likelihood-based inference with the behaviour of empirical SSL methods.

Subjects: Machine Learning , Machine Learning

Publish: 2025-12-04 02:26:56 UTC


#15 Sequential Randomization Tests Using E-values: A Betting Approach for Clinical Trials [PDF] [Copy] [Kimi] [REL]

Author: Fernando G Zampieri

Sequential monitoring of randomized trials traditionally relies on parametric assumptions or asymptotic approximations. We present a nonparametric sequential test, the randomization e-process (e-RT), that derives validity solely from the randomization mechanism. Using a betting framework, e-RT constructs a test martingale by sequentially wagering on treatment assignments given observed outcomes. Under the null hypothesis of no treatment effect, the expected wealth cannot grow, guaranteeing anytime-valid Type I error control regardless of stopping rule. We prove validity and present simulation studies demonstrating calibration and power. The e-RT provides a conservative, assumption-free complement to model-based sequential analyses.

Subjects: Methodology , Applications

Publish: 2025-12-04 01:24:17 UTC


#16 Reyes's I: Measuring Spatial Autocorrelation in Compositions [PDF] [Copy] [Kimi] [REL]

Authors: Lina Buitrago, Juan Sosa, Oscar Melo

Compositional observations arise when measurements are recorded as parts of a whole, so that only relative information is meaningful and the natural sample space is the simplex equipped with Aitchison geometry. Despite extensive development of compositional methods, a direct analogue of Moran's \(I\) for assessing spatial autocorrelation in areal compositional data has been lacking. We propose Reyes's \(I\), a Moran type statistic defined through the Aitchison inner product and norm, which is invariant to scale, to permutations of the parts, and to the choice of the \(\operatorname{ilr}\) contrast matrix. Under the randomization assumption, we derive an upper bound, the expected value, and the noncentral second moment, and we describe exact and Monte Carlo permutation procedures for inference. Through simulations covering identical, independent, and spatially correlated compositions under multiple covariance structures and neighborhood definitions, we show that Reyes's \(I\) provides stable behavior, competitive calibration, and improved efficiency relative to a naive alternative based on averaging componentwise Moran statistics. We illustrate practical utility by studying the spatial dependence of a composition measuring COVID-19 severity across Colombian departments during January 2021, documenting significant positive autocorrelation early in the month that attenuates over time.

Subject: Methodology

Publish: 2025-12-03 22:05:18 UTC


#17 A Benchmark Study of Classical and Dual Polynomial Regression (DPR)-Based Probability Density Estimation Technique [PDF] [Copy] [Kimi] [REL]

Authors: Shantanu Sarkar, Mousumi Sinha, Dexter Cahoy

The probability density function (PDF) plays a central role in statistical and machine learning modeling. Real-world data often deviates from Gaussian assumptions, exhibiting skewness and exponential decay. To evaluate how well different density estimation methods capture such irregularities, we generated six unimodal datasets from diverse distributions that reflect real-world anomalies. These were compared using parametric methods (Pearson Type I and Normal distribution) as well as non-parametric approaches, including histograms, kernel density estimation (KDE), and our proposed method. To accelerate computation, we implemented GPU-based versions of KDE (tKDE) and histogram estimation (tHDE) in TensorFlow, both of which outperform Python SciPy's KDE. Prior work demonstrated the use of piecewise modeling for density estimation, such as local polynomial regression; however, these methods are computationally intensive. Based on the concept of piecewise modeling, we developed a computationally efficient model, the Dual Polynomial Regression (DPR) method, which leverages tKDE or tHDE for training. DPR employs the piecewise strategy to split the PDF at its mode and fit polynomial regressions to the left and right halves independently, enabling better capture of the asymmetric shape of the unimodal distribution. We used the Mean Squared Error (MSE), Jensen-Shannon Divergence (JSD), and Pearson's correlation coefficient, with reference to the baseline PDF, to validate accuracy. We verified normalization using Area Under the Curve (AUC) and computational overhead via execution time. Validation on real-world systolic and diastolic data from 300,000 unique patients shows that the DPR of order 4, trained with tKDE, offers the best balance between accuracy and computational overhead.

Subject: Computation

Publish: 2025-12-03 20:06:15 UTC


#18 Detecting Perspective Shifts in Multi-agent Systems [PDF8] [Copy] [Kimi7] [REL]

Authors: Eric Bridgeford, Hayden Helm

Generative models augmented with external tools and update mechanisms (or \textit{agents}) have demonstrated capabilities beyond intelligent prompting of base models. As agent use proliferates, dynamic multi-agent systems have naturally emerged. Recent work has investigated the theoretical and empirical properties of low-dimensional representations of agents based on query responses at a single time point. This paper introduces the Temporal Data Kernel Perspective Space (TDKPS), which jointly embeds agents across time, and proposes several novel hypothesis tests for detecting behavioral change at the agent- and group-level in black-box multi-agent systems. We characterize the empirical properties of our proposed tests, including their sensitivity to key hyperparameters, in simulations motivated by a multi-agent system of evolving digital personas. Finally, we demonstrate via natural experiment that our proposed tests detect changes that correlate sensitively, specifically, and significantly with a real exogenous event. As far as we are aware, TDKPS is the first principled framework for monitoring behavioral dynamics in black-box multi-agent systems -- a critical capability as generative agent deployment continues to scale.

Subjects: Artificial Intelligence , Multiagent Systems , Methodology

Publish: 2025-12-04 17:24:56 UTC


#19 Bounds on Maximal Leakage over Bayesian Networks [PDF] [Copy] [Kimi] [REL]

Authors: Anuran Makur, Japneet Singh

Maximal leakage quantifies the leakage of information from data $X \in \mathcal{X}$ due to an observation $Y$. While fundamental properties of maximal leakage, such as data processing, sub-additivity, and its connection to mutual information, are well-established, its behavior over Bayesian networks is not well-understood and existing bounds are primarily limited to binary $\mathcal{X}$. In this paper, we investigate the behavior of maximal leakage over Bayesian networks with finite alphabets. Our bounds on maximal leakage are established by utilizing coupling-based characterizations which exist for channels satisfying certain conditions. Furthermore, we provide more general conditions under which such coupling characterizations hold for $|\mathcal{X}| = 4$. In the course of our analysis, we also present a new simultaneous coupling result on maximal leakage exponents. Finally, we illustrate the effectiveness of the proposed bounds with some examples.

Subjects: Information Theory , Probability , Statistics Theory

Publish: 2025-12-04 16:25:21 UTC


#20 Reliable Statistical Guarantees for Conformal Predictors with Small Datasets [PDF1] [Copy] [Kimi] [REL]

Authors: Miguel Sánchez-Domínguez, Lucas Lacasa, Javier de Vicente, Gonzalo Rubio, Eusebio Valero

Surrogate models (including deep neural networks and other machine learning algorithms in supervised learning) are capable of approximating arbitrarily complex, high-dimensional input-output problems in science and engineering, but require a thorough data-agnostic uncertainty quantification analysis before these can be deployed for any safety-critical application. The standard approach for data-agnostic uncertainty quantification is to use conformal prediction (CP), a well-established framework to build uncertainty models with proven statistical guarantees that do not assume any shape for the error distribution of the surrogate model. However, since the classic statistical guarantee offered by CP is given in terms of bounds for the marginal coverage, for small calibration set sizes (which are frequent in realistic surrogate modelling that aims to quantify error at different regions), the potentially strong dispersion of the coverage distribution around its average negatively impacts the reliability of the uncertainty model, often obtaining coverages below the expected value, resulting in a less applicable framework. After providing a gentle presentation of uncertainty quantification for surrogate models for machine learning practitioners, in this paper we bridge the gap by proposing a new statistical guarantee that offers probabilistic information for the coverage of a single conformal predictor. We show that the proposed framework converges to the standard solution offered by CP for large calibration set sizes and, unlike the classic guarantee, still offers reliable information about the coverage of a conformal predictor for small data sizes. We illustrate and validate the methodology in a suite of examples, and implement an open access software solution that can be used alongside common conformal prediction libraries to obtain uncertainty models that fulfil the new guarantee.

Subjects: Machine Learning , Data Analysis, Statistics and Probability , Machine Learning

Publish: 2025-12-04 08:29:17 UTC


#21 GraphBench: Next-generation graph learning benchmarking [PDF2] [Copy] [Kimi1] [REL]

Authors: Timo Stoll, Chendi Qian, Ben Finkelshtein, Ali Parviz, Darius Weber, Fabrizio Frasca, Hadar Shavit, Antoine Siraudin, Arman Mielke, Marie Anastacio, Erik Müller, Maya Bechler-Speicher, Michael Bronstein, Mikhail Galkin, Holger Hoos, Mathias Niepert, Bryan Perozzi, Jan Tönshoff, Christopher Morris

Machine learning on graphs has recently achieved impressive progress in various domains, including molecular property prediction and chip design. However, benchmarking practices remain fragmented, often relying on narrow, task-specific datasets and inconsistent evaluation protocols, which hampers reproducibility and broader progress. To address this, we introduce GraphBench, a comprehensive benchmarking suite that spans diverse domains and prediction tasks, including node-level, edge-level, graph-level, and generative settings. GraphBench provides standardized evaluation protocols -- with consistent dataset splits and performance metrics that account for out-of-distribution generalization -- as well as a unified hyperparameter tuning framework. Additionally, we benchmark GraphBench using message-passing neural networks and graph transformer models, providing principled baselines and establishing a reference performance. See www.graphbench.io for further details.

Subjects: Machine Learning , Artificial Intelligence , Neural and Evolutionary Computing , Machine Learning

Publish: 2025-12-04 05:30:31 UTC


#22 Constructive Approximation under Carleman's Condition, with Applications to Smoothed Analysis [PDF] [Copy] [Kimi] [REL]

Authors: Frederic Koehler, Beining Wu

A classical result of Carleman, based on the theory of quasianalytic functions, shows that polynomials are dense in $L^2(μ)$ for any $μ$ such that the moments $\int x^k dμ$ do not grow too rapidly as $k \to \infty$. In this work, we develop a fairly tight quantitative analogue of the underlying Denjoy-Carleman theorem via complex analysis, and show that this allows for nonasymptotic control of the rate of approximation by polynomials for any smooth function with polynomial growth at infinity. In many cases, this allows us to establish $L^2$ approximation-theoretic results for functions over general classes of distributions (e.g., multivariate sub-Gaussian or sub-exponential distributions) which were previously known only in special cases. As one application, we show that the Paley--Wiener class of functions bandlimited to $[-Ω,Ω]$ admits superexponential rates of approximation over all strictly sub-exponential distributions, which leads to a new characterization of the class. As another application, we solve an open problem recently posed by Chandrasekaran, Klivans, Kontonis, Meka and Stavropoulos on the smoothed analysis of learning, and also obtain quantitative improvements to their main results and applications.

Subjects: Probability , Machine Learning , Functional Analysis , Statistics Theory , Machine Learning

Publish: 2025-12-04 01:40:05 UTC


#23 When do spectral gradient updates help in deep learning? [PDF1] [Copy] [Kimi1] [REL]

Authors: Damek Davis, Dmitriy Drusvyatskiy

Spectral gradient methods, such as the recently popularized Muon optimizer, are a promising alternative to standard Euclidean gradient descent for training deep neural networks and transformers, but it is still unclear in which regimes they are expected to perform better. We propose a simple layerwise condition that predicts when a spectral update yields a larger decrease in the loss than a Euclidean gradient step. This condition compares, for each parameter block, the squared nuclear-to-Frobenius ratio of the gradient to the stable rank of the incoming activations. To understand when this condition may be satisfied, we first prove that post-activation matrices have low stable rank at Gaussian initialization in random feature regression, feedforward networks, and transformer blocks. In spiked random feature models we then show that, after a short burn-in, the Euclidean gradient's nuclear-to-Frobenius ratio grows with the data dimension while the stable rank of the activations remains bounded, so the predicted advantage of spectral updates scales with dimension. We validate these predictions in synthetic regression experiments and in NanoGPT-scale language model training, where we find that intermediate activations have low-stable-rank throughout training and the corresponding gradients maintain large nuclear-to-Frobenius ratios. Together, these results identify conditions for spectral gradient methods, such as Muon, to be effective in training deep networks and transformers.

Subjects: Machine Learning , Optimization and Control , Machine Learning

Publish: 2025-12-03 22:22:09 UTC


#24 The Geometry of Benchmarks: A New Path Toward AGI [PDF1] [Copy] [Kimi] [REL]

Author: Przemyslaw Chojecki

Benchmarks are the primary tool for assessing progress in artificial intelligence (AI), yet current practice evaluates models on isolated test suites and provides little guidance for reasoning about generality or autonomous self-improvement. Here we introduce a geometric framework in which all psychometric batteries for AI agents are treated as points in a structured moduli space, and agent performance is described by capability functionals over this space. First, we define an Autonomous AI (AAI) Scale, a Kardashev-style hierarchy of autonomy grounded in measurable performance on batteries spanning families of tasks (for example reasoning, planning, tool use and long-horizon control). Second, we construct a moduli space of batteries, identifying equivalence classes of benchmarks that are indistinguishable at the level of agent orderings and capability inferences. This geometry yields determinacy results: dense families of batteries suffice to certify performance on entire regions of task space. Third, we introduce a general Generator-Verifier-Updater (GVU) operator that subsumes reinforcement learning, self-play, debate and verifier-based fine-tuning as special cases, and we define a self-improvement coefficient $κ$ as the Lie derivative of a capability functional along the induced flow. A variance inequality on the combined noise of generation and verification provides sufficient conditions for $κ> 0$. Our results suggest that progress toward artificial general intelligence (AGI) is best understood as a flow on moduli of benchmarks, driven by GVU dynamics rather than by scores on individual leaderboards.

Subjects: Artificial Intelligence , Machine Learning , Statistics Theory

Publish: 2025-12-03 21:34:09 UTC


#25 High-Resolution Retrieval of Atmospheric Boundary Layers with Nonstationary Gaussian Processes [PDF] [Copy] [Kimi] [REL]

Authors: Haoran Xiong, Paytsar Muradyan, Christopher J. Geoga

The atmospheric boundary layer (ABL) plays a critical role in governing turbulent exchanges of momentum, heat moisture, and trace gases between the Earth's surface and the free atmosphere, thereby influencing meteorological phenomena, air quality, and climate processes. Accurate and temporally continuous characterization of the ABL structure and height evolution is crucial for both scientific understanding and practical applications. High-resolution retrievals of the ABL height from vertical velocity measurements is challenging because it is often estimated using empirical thresholds applied to profiles of vertical velocity variance or related turbulence diagnostics at each measurement altitude, which can suffer from limited sampling and sensitivity to noise. To address these limitations, this work employs nonstationary Gaussian process (GP) modeling to more effectively capture the spatio-temporal dependence structure in the data, enabling high-quality -- and, if desired, high-resolution -- estimates of the ABL height without reliance on ad-hoc parameter tuning. By leveraging Vecchia approximations, the proposed method can be applied to large-scale datasets, and example applications using full-day vertical velocity profiles comprising approximately $5$M measurements are presented.

Subjects: Atmospheric and Oceanic Physics , Computation , Methodology

Publish: 2025-12-03 19:40:28 UTC