Machine Learning

https://papers.cool/arxiv/stat.ML Machine Learning 2026-07-22T00:00:00+00:00 python-feedgen Cool Papers - Immersive Paper Discovery https://papers.cool/arxiv/2607.19334 Fundamental limits of distributed multiclass classification from simple binary decisions 2026-07-21T17:53:44+00:00 Ioannis Papageorgiou Srinivas Nomula Ayalvadi Ganesh Sidharth Jaggi Parimal Parag

We consider the problem of constructing a $K$-class classifier from the combination of $O(\log K)$ simple binary classifiers -- this is a natural paradigm to construct a sophisticated classifier in a distributed manner with each agent performing a relatively straightforward task. We study the fundamental performance limits of such a classifier when the corresponding binary classifiers are hyperplanes. For a stylized Gaussian setting where the $K$ class centers are independent Gaussian points in $\mathbb R^d$ and the observations are corrupted by Gaussian noise, we derive explicit performance bounds across several decoding and dimensional regimes. Extensive simulation experiments provide strong empirical validation of the presented theoretical results.

https://papers.cool/arxiv/2607.19004 The Tractability Landscape of Sampling with Inexact Scores 2026-07-21T11:37:57+00:00 Anming Gu Kevin Tian Hubert Yang Yusong Zhu

We provide a simple and tight characterization of the types of inexact score oracle access that permit sampling with vanishing total variation bias, for a standard, well-behaved target family. Our main result shows that any weaker error than the sub-Gaussian assumption used by [YW26] rules out the tractability of unbiased sampling. This strengthens the conclusion of [CCSW26] to be algorithm-agnostic, and to hold for a wider range of error assumptions.

https://papers.cool/arxiv/2607.18817 Algebraic Signatures for Structural Learning in Probability Tensors 2026-07-21T07:54:12+00:00 Akihiro Maeda Shohei Hidaka Satoshi Aoki

Algebraic statistics characterizes statistical models through polynomial constraints, but it has mainly been used for analytically specified model classes. This paper studies the inverse problem: identifying probabilistic structure from vanishing binomials observed in empirical probability tensors. We treat the vanishing binomials of a toric model as its algebraic signature, and turn the ideal-variety correspondence of algebraic statistics into an operational procedure for structural learning that identifies a model by signature matching without parameter estimation. By restricting attention to a computationally tractable class of configuration matrices, which we call {\it the Kronecker-stack class}, we make these signatures explicitly enumerable. Within this class we define minimum invariant constraint (MIC) as the atomic unit characterizing each signature and generalizing the notion of independence. We tested this approach employing MICs on synthetic data as well as on corpus-scale real language data. The results suggested the utility of the method, revealing that the identified rank-one structures correspond to interpretable sets of words. These results open up a new avenue for applying algebraic statistics to computational linguistics.

https://papers.cool/arxiv/2607.18652 The Price of Hidden Curvature: An $\widetildeΩ (d^{5/4} \sqrt{T})$ Lower Bound for Bandit Convex Optimization 2026-07-21T02:44:33+00:00 Nived Rajaraman

We establish a $\widetildeΩ(d^{5/4}\sqrt T)$ lower bound on the minimax expected regret of stochastic bandit convex optimization of $1$-Lipschitz functions on the Euclidean ball. This presents the first nontrivial regret lower bound that grows faster than $d\sqrt{T}$ for this problem, establishing that stochastic bandit convex optimization is fundamentally harder than linear bandits. The hard class of convex functions we construct takes the following form in dimension $2d$: for an action $a = (a^1,a^2) \in \mathbb{B}^{2d}_2$, each function is the scaled soft maximum of a "tube", $r^{-1} \| W^\star a^1 - \frac{r}{8\varepsilon} a^2 \|_2$ (hyperparameterized by $\varepsilon,r$), and a squared distance function, $\frac12 \| a^1 - u^\star \|_2^2 - \frac12 \| u^\star \|_2^2$. Here, $W^\star \in \mathbb{R}^{d \times d}$ is an unknown linear transformation, and $u^\star \in \mathbb{R}^{d}$ is an unknown vector which must be learned to minimize the function. Observations are informative about $u^\star$ only when the learner's action lies near the tube determined by $W^\star$, satisfying $a^2 \approx \frac{8\varepsilon}{r} W^\star a^1$: thus the learner must either find this tube without knowing $W^\star$, or spend observations learning useful directions of $W^\star$. Formally, our regret analysis exploits this tradeoff by bounding the posterior spread of Fisher information matrices obtained under an adaptive sequence of actions. Together, these ingredients give a sample complexity lower bound of $\widetildeΩ(d^{5/2}/\varepsilon^2)$ to find an $\varepsilon$-optimal action, which translates to an $\widetildeΩ (d^{5/4} \sqrt{T})$ regret lower bound. We also extend this lower bound to the unconstrained setting where the action space is $\mathbb{R}^d$.

https://papers.cool/arxiv/2607.18559 Mixing-Free and Signal-Optimal Learning of Gaussian Graphical Models from Glauber Dynamics 2026-07-20T22:44:45+00:00 Vignesh Tirukkonda Gautam Dasarathy

Gaussian graphical model selection is usually studied under independent sampling, but in many applications the data arise as a single trajectory of a dependent stochastic process. We study exact recovery of the graph from one trajectory of random-scan Gaussian Glauber dynamics. Existing techniques for this problem either inherit the mixing time of the chain, which can be super-polynomial in the dimension $p$ without strong assumptions, or are suboptimal in the minimum normalized edge strength $κ$. We propose two algorithms that are mixing-free and attain the $κ^{-2}$ dependence of the information-theoretic lower bounds. Both instantiate a shared dueling-neighborhood search meta-algorithm with a local statistic built directly from the update sequence. The first fits a least-squares regression at the updates of each node and recovers the graph from $\widetilde O(pd^{2}/κ^{2})$ updates, where $d$ is the maximum degree. This algorithm's data requirement depends on a local conditioning quantity, but only logarithmically and is provably optimal even when the underlying chain mixes slowly. The second algorithm is based on counting occurences of a specific update pattern and requires $\widetilde O(pd^{4}/κ^{2})$ updates, with no dependence on any condition number. The central technical challenge is that both statistics are built from dependent, non-stationary observations. Our analysis tackles this by demonstrating how to extract fresh Gaussian innovations from the update sequence, which yields mixing-free control of appropriate quantities. Neither the algorithms nor their analyses invoke stationarity, a spectral gap, or mixing conditions, and all guarantees hold from an arbitrary initialization.

https://papers.cool/arxiv/2607.18298 Disentangling Forced and Internal Climate Variability in Single Realizations using Dynamic Mode Decomposition with Control 2026-07-13T18:00:05+00:00 Nathan Mankovich Andrei Gavrilov Gustau Camps-Valls

We show that a single climate realization can be decomposed into forced and internal components by treating external forcing as a dynamical driver within a linear stochastic system, an idea grounded in pullback attractor theory. In doing so, we address a central methodological challenge in climate science with direct implications for climate projection and the detection and attribution of the forced response, disentangling the forced climate response from internal variability in a single observed record. Statistical methods range from approaches trained on large ensembles to techniques operating on single realizations. The latter often rely on linear frameworks such as linear inverse models (LIMs) and linear regression. LIMs ignore forcing predictors, whereas linear regression omits climate system dynamics. Here we introduce PullbackDMDc, a method grounded in non-autonomous dynamical systems theory and dynamic mode decomposition with control (DMDc), incorporating pullback attractor estimation to decompose a single climate realization into spatial modes and their associated forced and internal components, yielding a physically interpretable picture of the underlying dynamics. We illustrate the utility of PullbackDMDc for Earth System Model (ESM) evaluation by applying it to near-surface air temperature and sea-level pressure from reanalysis and four ESM large ensembles. PullbackDMDc estimates the forced response with skill matching or exceeding established baselines and identifies optimal forcing predictors against model-based ground truth. Its internal variability components reveal that ESMs qualitatively capture interannual and decadal modes while exhibiting systematic differences relative to each other and to observations. Skillful forced response estimation and a novel decomposition position PullbackDMDc as a practical tool for single-realization climate analysis and ESM evaluation.

https://papers.cool/arxiv/2607.19333 Provable diffusion-based posterior sampling for linear inverse problems via DDIM 2026-07-21T17:53:36+00:00 Yuchen Jiao Na Li Changxiao Cai Yuxin Chen Gen Li

Diffusion-based methods have achieved remarkable empirical success in solving inverse problems. However, many existing posterior samplers either lack rigorous theoretical guarantees or incur substantial computational overhead. We propose a simple and efficient algorithm, called \pddim, for solving linear inverse problems with diffusion priors via a DDIM-type sampler. Our method requires only lightweight, coordinate-wise modifications to the standard DDIM update, while explicitly incorporating the measurement model. The key idea is to perform posterior sampling separately along each singular direction of the measurement operator: for each direction, the sampler follows the learned diffusion prior when the observation signal-to-noise ratio (SNR) is below the corresponding diffusion SNR, and switches to a calibrated measurement-based predictor otherwise. We prove that the proposed sampler converges to the Bayesian posterior conditioned on the measurements. Empirical results show that the proposed sampler performs favorably against existing diffusion-based posterior samplers across a range of image restoration tasks, achieving the best performance on the majority of evaluation metrics considered. Overall, our results convert posterior sampling for noisy linear inverse problems to simple coordinate-wise DDIM updates, yielding an efficient, easy-to-implement algorithm with provable posterior consistency.

https://papers.cool/arxiv/2607.19206 Some cautionary tales about Bayesian predictive inference 2026-07-21T15:38:15+00:00 Emanuela Dreassi Fabrizio Leisen Luca Pratelli Pietro Rigo

Two misunderstandings, frequently arising in Bayesian predictive inference, are discussed. The first deals with the data generating mechanism, while the second consists in overestimating the role played by asymptotic exchangeability. Some consequences of such misunderstandings are highlighted through examples.

https://papers.cool/arxiv/2607.19167 Boundary-Adapted PINNs for Elliptic Dirichlet Problems: $H^2(Ω)$ A Priori Error Bounds with Application to Mean Escape Time Computation 2026-07-21T15:03:42+00:00 Nathanael Tepakbong Jun Fan Xiang Zhou Ding-Xuan Zhou

Motivated by the numerical computation of the Mean Escape Time (MET) $τ:Ω\to\mathbb{R}$ of a stochastic process from a bounded domain $Ω\subseteq\mathbb{R}^d$, we study elliptic Dirichlet boundary value problems (BVPs) using boundary-enforced Physics-Informed Neural Networks (PINNs), in which the Dirichlet condition is imposed exactly by multiplying the network output with a predefined distance-to-boundary approximation $ρ$. Combining approximation-theoretic and statistical-learning arguments for Rectified Quadratic Unit (ReQU) and hyperbolic tangent (tanh) networks, we derive a priori error bounds that make explicit the dependence on $ρ$. In particular, we show that exact boundary enforcement alone is not enough for $H^2(Ω)$ error bounds, and that a sufficient and essentially necessary condition is for $ρ$ to be a smooth distance approximation $\textit{normalized to first order}$, of the kind constructed in arXiv:2104.08426 [math.NA]. We thereby identify this subclass of $\textit{boundary-adapted}$ PINNs as the appropriate neural network ansatz for solving Dirichlet BVPs. Numerical experiments support the theory, showing that appropriate choices of $ρ$ improve accuracy and convergence, while poorly chosen distance functions can substantially degrade the solution. Our proof also yields new VC-dimension bounds for hypothesis spaces of higher-order derivatives of ReQU and tanh networks, together with new approximation bounds for shallow ReQU networks in higher-order Sobolev norms, all of which are of important independent interest.

https://papers.cool/arxiv/2607.19161 On the sensitivity of machine-learned probabilistic weather forecast models to scale-aware scoring rules 2026-07-21T14:59:11+00:00 Simon Lang Martin Leutbecher Sam Hatfield

Probabilistic forecast models can be machine-learned from data using loss functions based on scoring rules such as the Continuous Ranked Probability Score (CRPS). This note summarises a preliminary study comparing versions of AIFS-CRPS, a global weather forecast model, trained with different univariate and multivariate scoring rules that aim to explicitly represent scale-awareness in the loss function. In the first part, we compare the (almost) fair CRPS, a fair global energy score, and a graph energy score based on node neighbourhoods. Across standard verification metrics, forecast skill is broadly similar. In the extratropics we find only small differences, while in the tropics the graph energy score setup performs somewhat better and the global energy score shows some degradation. These results suggest that multivariate scores are a viable alternative to CRPS-based training for global machine-learned weather forecasting. In the second part of the study, we analyse how different scoring rules and scale-aware loss constraints shape the spectra of forecast fields. It is apparent that any form of explicit scale-awareness improves realism. Here, the largest differences are likely associated with different effective weights per scale.

https://papers.cool/arxiv/2607.19060 Deep learning-based prediction of time-resolved adhesive forces in viscoelastic Hertzian contacts 2026-07-21T12:48:48+00:00 Ali Maghami Merten Stender Michele Ciavarella Antonio Papangelo

Fast prediction of the response of adhesive soft viscoelastic contacts represents a current challenge in soft robotics and for gripping and manipulation tasks. Determining the complete time-resolved force trajectory requires full numerical simulations, whose computational cost is strongly parameter-dependent, making them impractical for real-time application or design-optimization loops. In this work, we overcome this limitation by training a scalar-conditioned, stateful, sequence-to-sequence deep learning model to predict the full force evolution from a prescribed displacement history for both short- and long-range adhesion regimes. The data set spans four orders of magnitude in loading and unloading rates and includes varied dwell times, with the Tabor parameter ranging from $0.2$ to $3.2$. To enable learning across these heterogeneous time scales, we introduce a fixed-measurement-step (FMS) representation that converts variable-length trajectories into fixed-length sequences while preserving their physical-time information. Different architectures were trained, including long short-term memory (LSTM) networks, temporal convolutional neural (TCN) networks, and time-distributed dense layers with three different Tabor-conditioning mechanisms. The models were compared using global waveform and error metrics. We found that the best-performing model has an LSTM architecture with concatenated conditioning, which achieves a held-out mean-squared error of $5.0\times10^{-4}$, a median pull-off-force error of $\approx2.2\%$, and a median hysteresis error of $\approx1.1\%$. For the held-out protocols, the model predicts a complete force trajectory with a median inference time of $0.16$ s. The model is tested across unseen parameter combinations and against analytical limiting cases, providing a rapid surrogate for repeated numerical evaluations with potential use in control-oriented applications.

https://papers.cool/arxiv/2607.18866 Optimizing Regret 2026-07-21T08:58:25+00:00 Irene Aldridge

Building on the identity that expected regret equals the covariance between costs and decisions, this paper develops the complete derivative theory of the covariance regret functional. We derive the Gâteaux derivative, showing that the universal steepest-descent direction is the contrarian policy $-(c-\bar{c})$, while ascent yields momentum. For linear policies $\hatπ(c) = Ac+b$, the gradient is the cost covariance matrix $Σ_c$, with a zero Hessian implying boundary-optimal solutions such as the minimum-variance portfolio. We extend to constrained optimization, sign-gradient duality between regret minimization and alpha maximization, finite-sample convergence bounds paralleling Thompson Sampling, and gradient-descent algorithms requiring only input observations, with applications to portfolio tilting and LLM-based allocation strategies.

https://papers.cool/arxiv/2607.18804 Elicitation without Backpropagation: Steering Model Behavior by Optimizing the Latent Posterior 2026-07-21T07:34:18+00:00 Garrett Baker Vinayak Pathak Daniel Murfet Susan Wei

In the \emph{latent posterior model} of transformer behavior, the next-token distribution arises from a posterior over latent predictive models conditioned on the context, mixed to generate continuations. We exploit this model in settings where it is exact, namely Bayes-filtered transformers (BFTs) meta-learned on sequences from a hierarchical prior, to introduce \textbf{Posterior Prefix Tuning (PPT)}, a new method for \emph{eliciting} behavior from a transformer: given a utility function on continuations, find a prompt under which the transformer generates continuations of high expected utility. For a BFT, the elicitation objective factors through the latent posterior, and the gradient of this objective can be estimated from samples of the prior alone. PPT optimizes the parameters of a distribution over hard prompts: it draws prior samples once from the BFT via predictive Monte Carlo (PMC), then estimates the gradient by importance sampling against them. The optimization performs no transformer forward passes and no backpropagation through the transformer, and the prior samples are utility-independent, so a single set of samples drives elicitation against any number of utilities at negligible marginal cost. We validate PPT on Beta--Bernoulli and reinforced urn BFTs across three utility families (reverse cross-entropy, frequency matching, Dyck validity).

https://papers.cool/arxiv/2607.18734 Uncertainty quantification in mechanics: A unified Bayesian perspective 2026-07-21T05:46:28+00:00 Sascha Ranftl Malte Rolf Gerhard A. Holzapfel Ellen Kuhl

Uncertainty quantification (UQ) is essential to experimental mechanics, but has become particularly relevant in computational mechanics, manifesting in two fundamental problem types: forward and inverse problems. The former addresses how input uncertainties propagate to the quantities of interest, whereas the latter aims to infer unknown parameters from experimental observations or simulations. Since efficient propagation typically requires a prohibitive number of evaluations to compute marginal output distributions, the development of fast, data-driven surrogate models becomes necessary. Thus, we can distinguish between two inverse tasks: (i) the identification and calibration of input uncertainties, and (ii) the construction of surrogates, a methodology collectively referred to as surrogate-based UQ. Building on probabilistic reasoning and the concept of partial belief, we demonstrate that Bayesian probability theory provides a unified theoretical framework for addressing both problem types. We further show that Bayesian inference allows for the seamless incorporation of essential subproblems, including model selection for identifying the most probable model specifications and experimental design for optimizing data collection by identifying experiments or simulations that maximize expected information gain about parameters, among others such as connections to sensitivity analysis or the use of special priors like random fields. While this theoretical framework is presented for general mechanical problems, particular emphasis is placed on biomechanics, where variability and uncertainty is especially pronounced due to inherent biological heterogeneity, patient-specific variability, and noisy data.

https://papers.cool/arxiv/2607.18431 Using binary silver labels in electronic health records-based computable phenotyping algorithms 2026-07-20T18:32:41+00:00 Shuhe Wang Matthew T. Slaughter Jennifer C. Nelson Brian D. Williamson

Gold-standard phenotype labels are often unavailable at scale in electronic health record (EHR) studies because they require manual chart review. Weakly supervised phenotyping methods instead use silver-standard labels, such as diagnosis-code counts, natural language processing (NLP) mentions, medication indicators, or laboratory thresholds. PheNorm is widely used for this purpose, but its original formulation was designed for count-valued silver labels and relies on log transformation, utilization normalization, and Gaussian mixture modeling. These steps are not directly suited to binary silver labels, which are common and may be highly informative. We propose Binary PheNorm, an extension that uses binary silver labels directly in the corruption-and-regression denoising step and produces a continuous phenotype score without EM calibration. We also consider a lasso-regularized version for high-dimensional EHR settings and combined models using both binary and count labels. In simulations, Binary PheNorm achieved strong discrimination using binary labels alone and often improved performance when combined with count labels. In anaphylaxis, AUC increased from 0.793 for an epinephrine-mention indicator to 0.891-0.892 after Binary PheNorm. In acute pancreatitis, AUC increased from 0.736 for a lipase-threshold indicator to 0.805-0.819. These results support Binary PheNorm as a practical weakly supervised approach when informative binary silver labels are available.

https://papers.cool/arxiv/2607.18422 PAC--Bayes Bounds on Quotient Parameter Spaces: Geometry-induced Implicit-Bias Priors 2026-07-20T18:12:00+00:00 Nicola Aladrah Fabio Anselmi

Overparameterized models often have continuous parameter symmetries, so different parameters define the same predictor. We show that PAC--Bayesian analysis should be performed on the quotient predictor space: pushing a prior and posterior to the quotient preserves the empirical and population Gibbs risks while removing the nonnegative KL contribution caused solely by how the two distributions differ among parameterizations of the same predictor. Quotienting alone does not determine which prior to use. We construct a canonical choice of one parameterization for each predictor and account for the geometric volume of its equivalent parameterizations. This transforms a neutral reference prior into a data-independent prior that reflects the model's implicit bias. It approximates the ideal but inadmissible posterior-matched prior, which would minimize the KL term by depending on the training data. The resulting certificate is tighter exactly when this geometry-induced prior has smaller KL divergence from the learned quotient posterior than the neutral prior. We test this prediction in Fourier regression with a Hadamard parameterization and in Query-Key attention, using ordinary SGD without an explicit regularizer. The implicit-bias prior reduces the mean quotient-space KL by $40.69\%$ and the mean PAC--Bayes certificate by $21.40\%$ in the Fourier-Hadamard experiment. The smaller, prior-scale-dependent improvement in Query-Key attention confirms the predicted conditional nature of the effect.