Processing math: 100%

Statistics

2025-05-13 | | Total: 90

#1 Moderation effects and elasticities in compositional regression with a total. Application to Bayesian spatiotemporal modelling of all-cause mortality from environmental stressors [PDF] [Copy] [Kimi] [REL]

Authors: Germà Coenders, Javier Palarea-Albaladejo, Marc Saez, Maria A. Barceló

Compositional regression models with a real-valued response variable can generally be specified as log-contrast models subject to a zero-sum constraint on the model coefficients. This formulation emphasises the relative information conveyed in the composition, while the overall total is regarded irrelevant. In this work, such a setting is extended to account not only for total effects, formally defined in a so-called T-space, but also for moderation or interaction effects. This is applied in the context of complex spatiotemporal data modelling, through an adaptation of the integrated nested Laplace approximation (INLA) method within a Bayesian estimation framework. Particular emphasis is placed on the interpretation of model coefficients and results, both on the original scale of the response variable and in terms of elasticities. The methodology is demonstrated through a detailed case study investigating the relationship between all-cause mortality and the interaction between extreme temperatures, air pollution composition, and total air pollution in Catalonia, Spain, during the summer of 2022. The results indicate that extreme temperatures are associated with an increased risk of mortality four days after exposure. Additionally, exposure to total air pollution, especially to NO2, is linked to elevated mortality risk regardless of temperature. In contrast, particulate matter is associated to increased mortality only when exposure occurs on days of extreme heat.

Subjects: Methodology , Applications

Publish: 2025-05-12 17:49:30 UTC


#2 Analytic theory of dropout regularization [PDF] [Copy] [Kimi] [REL]

Authors: Francesco Mori, Francesca Mignacco

Dropout is a regularization technique widely used in training artificial neural networks to mitigate overfitting. It consists of dynamically deactivating subsets of the network during training to promote more robust representations. Despite its widespread adoption, dropout probabilities are often selected heuristically, and theoretical explanations of its success remain sparse. Here, we analytically study dropout in two-layer neural networks trained with online stochastic gradient descent. In the high-dimensional limit, we derive a set of ordinary differential equations that fully characterize the evolution of the network during training and capture the effects of dropout. We obtain a number of exact results describing the generalization error and the optimal dropout probability at short, intermediate, and long training times. Our analysis shows that dropout reduces detrimental correlations between hidden nodes, mitigates the impact of label noise, and that the optimal dropout probability increases with the level of noise in the data. Our results are validated by extensive numerical simulations.

Subjects: Machine Learning , Disordered Systems and Neural Networks , Statistical Mechanics , Machine Learning

Publish: 2025-05-12 17:45:02 UTC


#3 Nonparametric Instrumental Variable Inference with Many Weak Instruments [PDF] [Copy] [Kimi] [REL]

Authors: Lars van der Laan, Nathan Kallus, Aurélien Bibaut

We study inference on linear functionals in the nonparametric instrumental variable (NPIV) problem with a discretely-valued instrument under a many-weak-instruments asymptotic regime, where the number of instrument values grows with the sample size. A key motivating example is estimating long-term causal effects in a new experiment with only short-term outcomes, using past experiments to instrument for the effect of short- on long-term outcomes. Here, the assignment to a past experiment serves as the instrument: we have many past experiments but only a limited number of units in each. Since the structural function is nonparametric but constrained by only finitely many moment restrictions, point identification typically fails. To address this, we consider linear functionals of the minimum-norm solution to the moment restrictions, which is always well-defined. As the number of instrument levels grows, these functionals define an approximating sequence to a target functional, replacing point identification with a weaker asymptotic notion suited to discrete instruments. Extending the Jackknife Instrumental Variable Estimator (JIVE) beyond the classical parametric setting, we propose npJIVE, a nonparametric estimator for solutions to linear inverse problems with many weak instruments. We construct automatic debiased machine learning estimators for linear functionals of both the structural function and its minimum-norm projection, and establish their efficiency in the many-weak-instruments regime.

Subjects: Methodology , Statistics Theory , Machine Learning

Publish: 2025-05-12 16:36:55 UTC


#4 Transfer Learning Across Fixed-Income Product Classes [PDF] [Copy] [Kimi] [REL]

Authors: Nicolas Camenzind, Damir Filipovic

We propose a framework for transfer learning of discount curves across different fixed-income product classes. Motivated by challenges in estimating discount curves from sparse or noisy data, we extend kernel ridge regression (KR) to a vector-valued setting, formulating a convex optimization problem in a vector-valued reproducing kernel Hilbert space (RKHS). Each component of the solution corresponds to the discount curve implied by a specific product class. We introduce an additional regularization term motivated by economic principles, promoting smoothness of spread curves between product classes, and show that it leads to a valid separable kernel structure. A main theoretical contribution is a decomposition of the vector-valued RKHS norm induced by separable kernels. We further provide a Gaussian process interpretation of vector-valued KR, enabling quantification of estimation uncertainty. Illustrative examples demonstrate that transfer learning significantly improves extrapolation performance and tightens confidence intervals compared to single-curve estimation.

Subjects: Machine Learning , Machine Learning , Computational Finance , Mathematical Finance

Publish: 2025-05-12 15:43:29 UTC


#5 Separable models for dynamic signed networks [PDF] [Copy] [Kimi] [REL]

Authors: Alberto Caimo, Isabella Gollini

Signed networks capture the polarity of relationships between nodes, providing valuable insights into complex systems where both supportive and antagonistic interactions play a critical role in shaping the network's dynamics. We propose a separable temporal generative framework based on multi-layer exponential random graph models, characterised by the assumption of conditional independence between the sign and interaction effects. This structure preserves the flexibly and explanatory power inherent in the binary network specification while adhering to consistent balance theory assumptions. Using a fully probabilistic Bayesian paradigm, we infer the doubly intractable posterior distribution of model parameters via an adaptive Metropolis-Hastings approximate exchange algorithm. We illustrate the interpretability of our model by analysing signed relations among U.S. Senators during Ronald Reagan's second term (1985-1989). Specifically, we aim to understand whether these relations are consistent and balanced or reflect patterns of supportive or antagonistic alliances.

Subjects: Methodology , Applications

Publish: 2025-05-12 15:33:57 UTC


#6 An investigation of air pollution-induced temperature sensitivity and susceptibility to heat-related hospitalization in the Medicare population [PDF] [Copy] [Kimi] [REL]

Authors: Lauren Mock, Rachel C. Nethery, Poonam Gandhi, Ashwaghosha Parthasarathi, Melanie Rua, David Robinson, Soko Setoguchi, Kevin Josey

Background: With rising temperatures and an aging US population, understanding how to prevent heat-related illness among older Americans will be an increasingly critical objective. Despite biological plausibility, no study to date has investigated how exposure to fine particulate matter air pollution (PM2.5) may contribute to risk of heat-related hospitalization. Methods: We identified Medicare fee-for-service beneficiaries who experienced a heat-related hospitalization between 2008 and 2016. Using a case-crossover design and fitting Bayesian conditional logistic regression models, we characterized the association between heat-related hospitalization and temperature and PM2.5 exposures. We estimated the relative excess risk due to interaction (RERI) to quantify the additive-scale interaction of simultaneous exposure to heat and PM2.5. Results: We observed 112,969 heat-related hospitalizations. Fixing PM2.5 at the case day median, increasing temperature from its case day median to the 95th percentile was associated with an odds ratio of 1.045 (95% CI: 1.026, 1.063). Fixing temperature at the case day median and increasing PM2.5 from its median to the 95th percentile was associated with an odds ratio of 1.014 (95% CI: 0.993, 1.037). We estimated the RERI associated with simultaneous median-to-95th percentile increases in temperature and PM2.5 to be 0.032 (0.007, 0.057). Conclusion: Using nationwide Medicare claims and a self-matched study design, we found evidence supporting synergism between temperature and PM2.5 exposures on the risk of heat-related hospitalization.

Subject: Applications

Publish: 2025-05-12 15:30:06 UTC


#7 Constructing Bayes Minimax Estimators through Integral Transformations [PDF] [Copy] [Kimi] [REL]

Authors: Dominique Fourdrinier, William E. Strawderman, Martin T. Wells

The problem of Bayes minimax estimation for the mean of a multivariate normal distribution under quadratic loss has attracted significant attention recently. These estimators have the advantageous property of being admissible, similar to Bayes procedures, while also providing the conservative risk guarantees typical of frequentist methods. This paper demonstrates that Bayes minimax estimators can be derived using integral transformation techniques, specifically through the I-transform and the Laplace transform, as long as appropriate spherical priors are selected. Several illustrative examples are included to highlight the effectiveness of the proposed approach.

Subject: Statistics Theory

Publish: 2025-05-12 15:21:13 UTC


#8 Certified Data Removal Under High-dimensional Settings [PDF] [Copy] [Kimi] [REL]

Authors: Haolin Zou, Arnab Auddy, Yongchan Kwon, Kamiar Rahnama Rad, Arian Maleki

Machine unlearning focuses on the computationally efficient removal of specific training data from trained models, ensuring that the influence of forgotten data is effectively eliminated without the need for full retraining. Despite advances in low-dimensional settings, where the number of parameters p is much smaller than the sample size n, extending similar theoretical guarantees to high-dimensional regimes remains challenging. We propose an unlearning algorithm that starts from the original model parameters and performs a theory-guided sequence of Newton steps T{1,2}. After this update, carefully scaled isotropic Laplacian noise is added to the estimate to ensure that any (potential) residual influence of forget data is completely removed. We show that when both n,p with a fixed ratio n/p, significant theoretical and computational obstacles arise due to the interplay between the complexity of the model and the finite signal-to-noise ratio. Finally, we show that, unlike in low-dimensional settings, a single Newton step is insufficient for effective unlearning in high-dimensional problems -- however, two steps are enough to achieve the desired certifiebility. We provide numerical experiments to support the certifiability and accuracy claims of this approach.

Subjects: Machine Learning , Machine Learning

Publish: 2025-05-12 15:11:13 UTC


#9 Modelling higher education dropouts using sparse and interpretable post-clustering logistic regression [PDF] [Copy] [Kimi] [REL]

Authors: Andrea Nigri, Massimo Bilancia, Barbara Cafarelli, Samuele Magro

Higher education dropout constitutes a critical challenge for tertiary education systems worldwide. While machine learning techniques can achieve high predictive accuracy on selected datasets, their adoption by policymakers remains limited and unsatisfactory, particularly when the objective is the unsupervised identification and characterization of student subgroups at elevated risk of dropout. The model introduced in this paper is a specialized form of logistic regression, specifically adapted to the context of university dropout analysis. Logistic regression continues to serve as a foundational tool among reliable statistical models, primarily due to the ease with which its parameters can be interpreted in terms of odds ratios. Our approach significantly extends this framework by incorporating heterogeneity within the student population. This is achieved through the application of a preliminary clustering algorithm that identifies latent subgroups, each characterized by distinct dropout propensities, which are then modeled via cluster-specific effects. We provide a detailed interpretation of the model parameters within this extended framework and enhance interpretability by imposing sparsity through a tailored variant of the LASSO algorithm. To demonstrate the practical applicability of the proposed methodology, we present an extensive case study based on the Italian university system, in which all the developed tools are systematically applied

Subjects: Applications , Machine Learning

Publish: 2025-05-12 14:05:23 UTC


#10 A Value of Information-based assessment of strain-based thickness loss monitoring in ship hull structures [PDF] [Copy] [Kimi] [REL]

Authors: Nicholas E. Silionis, Konstantinos N. Anyfantis

Recent advances in Structural Health Monitoring (SHM) have attracted industry interest, yet real-world applications, such as in ship structures remain scarce. Despite SHM's potential to optimise maintenance, its adoption in ships is limited due to the lack of clearly quantifiable benefits for hull maintenance. This study employs a Bayesian pre-posterior decision analysis to quantify the value of information (VoI) from SHM systems monitoring corrosion-induced thickness loss (CITL) in ship hulls, in a first-of-its-kind analysis for ship structures. We define decision-making consequence cost functions based on exceedance probabilities relative to a target CITL threshold, which can be set by the decision-maker. This introduces a practical aspect to our framework, that enables implicitly modelling the decision-maker's risk perception. We apply this framework to a large-scale, high-fidelity numerical model of a commercial vessel and examine the relative benefits of different CITL monitoring strategies, including strain-based SHM and traditional on-site inspections.

Subject: Applications

Publish: 2025-05-12 10:34:41 UTC


#11 Some insights into depth estimators for location and scatter in the multivariate setting [PDF] [Copy] [Kimi] [REL]

Authors: Jorge G. Adrover, Marcelo Ruiz

The concept of statistical depth has received considerable attention as a way to extend the notions of the median and quantiles to other statistical models. These procedures aim to formalize the idea of identifying deeply embedded fits to a model that are less influenced by contamination. Since contamination introduces bias in estimators, it is well known in the location model that the median minimizes the worst-case performance, in terms of maximum bias, among all equivariant estimators. In the multivariate case, Tukey's median was a groundbreaking concept for location estimation, and its counterpart for scatter matrices has recently attracted considerable interest. The breakdown point and the maximum asymptotic bias are key concepts used to summarize an estimator's behavior under contamination. For the location and scale model, we consider two closely related depth formulations, whose deepest estimators display significantly different behavior in terms of breakdown point. In the multivariate setting, we analyze recently introduced concentration inequalities that provide a unified framework for studying both the statistical convergence rate and robustness of Tukey's median and depth-based scatter matrices. We observe that slight variations in these inequalities allow us to visualize the maximum bias behavior of the deepest estimators. Since the maximum bias for depth-based scatter matrices had not previously been derived, we explicitly calculate both the breakdown point and the maximum bias curve for the deepest scatter matrices.

Subject: Statistics Theory

Publish: 2025-05-12 09:29:11 UTC


#12 Causal mediation analysis with one or multiple mediators: a comparative study [PDF] [Copy] [Kimi] [REL]

Authors: Judith Abécassis, Houssam Zenati, Sami Boumaïza, Julie Josse, Bertrand Thirion

Mediation analysis breaks down the causal effect of a treatment on an outcome into an indirect effect, acting through a third group of variables called mediators, and a direct effect, operating through other mechanisms. Mediation analysis is hard because confounders between treatment, mediators, and outcome blur effect estimates in observational studies. Many estimators have been proposed to adjust on those confounders and provide accurate causal estimates. We consider parametric and non-parametric implementations of classical estimators and provide a thorough evaluation for the estimation of the direct and indirect effects in the context of causal mediation analysis for binary, continuous, and multi-dimensional mediators. We assess several approaches in a comprehensive benchmark on simulated data. Our results show that advanced statistical approaches such as the multiply robust and the double machine learning estimators achieve good performances in most of the simulated settings and on real data. As an example of application, we propose a thorough analysis of factors known to influence cognitive functions to assess if the mechanism involves modifications in brain morphology using the UK Biobank brain imaging cohort. This analysis shows that for several physiological factors, such as hypertension and obesity, a substantial part of the effect is mediated by changes in the brain structure. This work provides guidance to the practitioner from the formulation of a valid causal mediation problem, including the verification of the identification assumptions, to the choice of an adequate estimator.

Subjects: Applications , Machine Learning

Publish: 2025-05-12 08:10:50 UTC


#13 GMM with Many Weak Moment Conditions and Nuisance Parameters: General Theory and Applications to Causal Inference [PDF] [Copy] [Kimi] [REL]

Authors: Rui Wang, Kwun Chuen Gary Chan, Ting Ye

Weak identification is a common issue for many statistical problems -- for example, when instrumental variables are weakly correlated with treatment, or when proxy variables are weakly correlated with unmeasured confounders. Under weak identification, standard estimation methods, such as the generalized method of moments (GMM), can have sizeable bias in finite samples or even asymptotically. In addition, many practical settings involve a growing number of nuisance parameters, adding further complexity to the problem. In this paper, we study estimation and inference under a general nonlinear moment model with many weak moment conditions and many nuisance parameters. To obtain debiased inference for finite-dimensional target parameters, we demonstrate that Neyman orthogonality plays a stronger role than in conventional settings with strong identification. We study a general two-step debiasing estimator that allows for possibly nonparametric first-step estimation of nuisance parameters, and we establish its consistency and asymptotic normality under a many weak moment asymptotic regime. Our theory accommodates both high-dimensional moment conditions and infinite-dimensional nuisance parameters. We provide high-level assumptions for a general setting and discuss specific applications to the problems of estimation and inference with weak instruments and weak proxies.

Subjects: Statistics Theory , Methodology

Publish: 2025-05-12 07:31:48 UTC


#14 On Data Sharpening in Nonparametric Autoregressive Models [PDF] [Copy] [Kimi] [REL]

Authors: Simon Snyman, Lengyi Han, W. John Braun

Data sharpening has been shown to reduce bias in nonparametric regression and density estimation. Its performance on nonlinear first order autoregressive models is studied theoretically and numerically in this paper. Although the asymptotic properties of data sharpening are not as favourable in the presence of serial dependence as in bivariate regression with independent responses, it is still found to reduce bias under mild conditions on the autoregression function. Numerical comparisons with the bias reduction method of Cheng et al. (2018) indicate that data sharpening is competitive in this setting.

Subject: Methodology

Publish: 2025-05-12 07:12:35 UTC


#15 FCPCA: Fuzzy clustering of high-dimensional time series based on common principal component analysis [PDF] [Copy] [Kimi] [REL]

Authors: Ziling Ma, Ángel López-Oriona, Hernando Ombao, Ying Sun

Clustering multivariate time series data is a crucial task in many domains, as it enables the identification of meaningful patterns and groups in time-evolving data. Traditional approaches, such as crisp clustering, rely on the assumption that clusters are sufficiently separated with little overlap. However, real-world data often defy this assumption, exhibiting overlapping distributions or overlapping clouds of points and blurred boundaries between clusters. Fuzzy clustering offers a compelling alternative by allowing partial membership in multiple clusters, making it well-suited for these ambiguous scenarios. Despite its advantages, current fuzzy clustering methods primarily focus on univariate time series, and for multivariate cases, even datasets of moderate dimensionality become computationally prohibitive. This challenge is further exacerbated when dealing with time series of varying lengths, leaving a clear gap in addressing the complexities of modern datasets. This work introduces a novel fuzzy clustering approach based on common principal component analysis to address the aforementioned shortcomings. Our method has the advantage of efficiently handling high-dimensional multivariate time series by reducing dimensionality while preserving critical temporal features. Extensive numerical results show that our proposed clustering method outperforms several existing approaches in the literature. An interesting application involving brain signals from different drivers recorded from a simulated driving experiment illustrates the potential of the approach.

Subjects: Methodology , Applications , Machine Learning

Publish: 2025-05-12 06:59:17 UTC


#16 ALPCAH: Subspace Learning for Sample-wise Heteroscedastic Data [PDF] [Copy] [Kimi] [REL]

Authors: Javier Salazar Cavazos, Jeffrey A. Fessler, Laura Balzano

Principal component analysis (PCA) is a key tool in the field of data dimensionality reduction. However, some applications involve heterogeneous data that vary in quality due to noise characteristics associated with each data sample. Heteroscedastic methods aim to deal with such mixed data quality. This paper develops a subspace learning method, named ALPCAH, that can estimate the sample-wise noise variances and use this information to improve the estimate of the subspace basis associated with the low-rank structure of the data. Our method makes no distributional assumptions of the low-rank component and does not assume that the noise variances are known. Further, this method uses a soft rank constraint that does not require subspace dimension to be known. Additionally, this paper develops a matrix factorized version of ALPCAH, named LR-ALPCAH, that is much faster and more memory efficient at the cost of requiring subspace dimension to be known or estimated. Simulations and real data experiments show the effectiveness of accounting for data heteroscedasticity compared to existing algorithms. Code available at https://github.com/javiersc1/ALPCAH.

Subjects: Machine Learning , Machine Learning , Signal Processing

Publish: 2025-05-12 06:49:47 UTC


#17 Adaptive, Robust and Scalable Bayesian Filtering for Online Learning [PDF] [Copy] [Kimi] [REL]

Author: Gerardo Duran-Martin

In this thesis, we introduce Bayesian filtering as a principled framework for tackling diverse sequential machine learning problems, including online (continual) learning, prequential (one-step-ahead) forecasting, and contextual bandits. To this end, this thesis addresses key challenges in applying Bayesian filtering to these problems: adaptivity to non-stationary environments, robustness to model misspecification and outliers, and scalability to the high-dimensional parameter space of deep neural networks. We develop novel tools within the Bayesian filtering framework to address each of these challenges, including: (i) a modular framework that enables the development adaptive approaches for online learning; (ii) a novel, provably robust filter with similar computational cost to standard filters, that employs Generalised Bayes; and (iii) a set of tools for sequentially updating model parameters using approximate second-order optimisation methods that exploit the overparametrisation of high-dimensional parametric models such as neural networks. Theoretical analysis and empirical results demonstrate the improved performance of our methods in dynamic, high-dimensional, and misspecified models.

Subjects: Machine Learning , Machine Learning

Publish: 2025-05-12 06:40:29 UTC


#18 Spatial Confounding in Multivariate Areal Data Analysis [PDF] [Copy] [Kimi] [REL]

Authors: Kyle Lin Wu, Sudipto Banerjee

We investigate spatial confounding in the presence of multivariate disease dependence. In the "analysis model perspective" of spatial confounding, adding a spatially dependent random effect can lead to significant variance inflation of the posterior distribution of the fixed effects. The "data generation perspective" views covariates as stochastic and correlated with an unobserved spatial confounder, leading to inferior statistical inference over multiple realizations. While multiple methods have been proposed for adjusting statistical models to mitigate spatial confounding in estimating regression coefficients, results on interactions between spatial confounding and multivariate dependence are very limited. We contribute to this domain by investigating spatial confounding from the analysis and data generation perspectives in a Bayesian coregionalized areal regression model. We derive novel results that distinguish variance inflation due to spatial confounding from inflation based on multicollinearity between predictors and provide insights into the estimation efficiency of a spatial estimator under a spatially confounded data generation model. We demonstrate favorable performance of spatial analysis compared to a non-spatial model in our simulation experiments even in the presence of spatial confounding and a misspecified spatial structure. In this regard, we align with several other authors in the defense of traditional hierarchical spatial models (Gilbert et al., 2025; Khan and Berrett, 2023; Zimmerman and Ver Hoef, 2022) and extend this defense to multivariate areal models. We analyze county-level data from the US on obesity / diabetes prevalence and diabetes-related cancer mortality, comparing the results with and without spatial random effects.

Subject: Methodology

Publish: 2025-05-12 05:14:38 UTC


#19 Enhancing Inference for Small Cohorts via Transfer Learning and Weighted Integration of Multiple Datasets [PDF] [Copy] [Kimi] [REL]

Authors: Subharup Guha, Mengqi Xu, Yi Li

Lung sepsis remains a significant concern in the Northeastern U.S., yet the national eICU Collaborative Database includes only a small number of patients from this region, highlighting underrepresentation. Understanding clinical variables such as FiO2, creatinine, platelets, and lactate, which reflect oxygenation, kidney function, coagulation, and metabolism, is crucial because these markers influence sepsis outcomes and may vary by sex. Transfer learning helps address small sample sizes by borrowing information from larger datasets, although differences in covariates and outcome-generating mechanisms between the target and external cohorts can complicate the process. We propose a novel weighting method, TRANSfer LeArning wiTh wEights (TRANSLATE), to integrate data from various sources by incorporating domain-specific characteristics through learned weights that align external data with the target cohort. These weights adjust for cohort differences, are proportional to each cohort's effective sample size, and downweight dissimilar cohorts. TRANSLATE offers theoretical guarantees for improved precision and applies to a wide range of estimands, including means, variances, and distribution functions. Simulations and a real-data application to sepsis outcomes in the Northeast cohort, using a much larger sample from other U.S. regions, show that the method enhances inference while accounting for regional heterogeneity.

Subject: Methodology

Publish: 2025-05-11 23:46:13 UTC


#20 Constrained Online Decision-Making with Density Estimation Oracles [PDF1] [Copy] [Kimi] [REL]

Authors: Haichen Hu, David Simchi-Levi, Navid Azizan

Contextual online decision-making problems with constraints appear in a wide range of real-world applications, such as personalized recommendation with resource limits, adaptive experimental design, and decision-making under safety or fairness requirements. In this paper, we investigate a general formulation of sequential decision-making with stage-wise feasibility constraints, where at each round, the learner must select an action based on observed context while ensuring that a problem-specific feasibility criterion is satisfied. We propose a unified algorithmic framework that captures many existing constrained learning problems, including constrained bandits, active learning with label budgets, online hypothesis testing with Type I error control, and model calibration. Central to our approach is the concept of upper counterfactual confidence bounds, which enables the design of practically efficient online algorithms with strong theoretical guarantee using any offline conditional density estimation oracle. Technically, to handle feasibility constraints in complex environments, we introduce a generalized notion of the eluder dimension - extending it from the classical setting based on square loss to a broader class of metric-like probability divergences. This allows us to capture the complexity of various density function classes and characterize the utility regret incurred due to feasibility constraint uncertainty. Our result offers a principled foundation for constrained sequential decision-making in both theory and practice.

Subjects: Machine Learning , Machine Learning

Publish: 2025-05-11 19:22:04 UTC


#21 A Sparse Bayesian Learning Algorithm for Estimation of Interaction Kernels in Motsch-Tadmor Model [PDF] [Copy] [Kimi] [REL]

Authors: Jinchao Feng, Sui Tang

In this paper, we investigate the data-driven identification of asymmetric interaction kernels in the Motsch-Tadmor model based on observed trajectory data. The model under consideration is governed by a class of semilinear evolution equations, where the interaction kernel defines a normalized, state-dependent Laplacian operator that governs collective dynamics. To address the resulting nonlinear inverse problem, we propose a variational framework that reformulates kernel identification using the implicit form of the governing equations, reducing it to a subspace identification problem. We establish an identifiability result that characterizes conditions under which the interaction kernel can be uniquely recovered up to scale. To solve the inverse problem robustly, we develop a sparse Bayesian learning algorithm that incorporates informative priors for regularization, quantifies uncertainty, and enables principled model selection. Extensive numerical experiments on representative interacting particle systems demonstrate the accuracy, robustness, and interpretability of the proposed framework across a range of noise levels and data regimes.

Subjects: Machine Learning , Machine Learning , Dynamical Systems

Publish: 2025-05-11 17:43:32 UTC


#22 Learning curves theory for hierarchically compositional data with power-law distributed features [PDF] [Copy] [Kimi] [REL]

Authors: Francesco Cagnetta, Hyunmo Kang, Matthieu Wyart

Recent theories suggest that Neural Scaling Laws arise whenever the task is linearly decomposed into power-law distributed units. Alternatively, scaling laws also emerge when data exhibit a hierarchically compositional structure, as is thought to occur in language and images. To unify these views, we consider classification and next-token prediction tasks based on probabilistic context-free grammars -- probabilistic models that generate data via a hierarchy of production rules. For classification, we show that having power-law distributed production rules results in a power-law learning curve with an exponent depending on the rules' distribution and a large multiplicative constant that depends on the hierarchical structure. By contrast, for next-token prediction, the distribution of production rules controls the local details of the learning curve, but not the exponent describing the large-scale behaviour.

Subjects: Machine Learning , Disordered Systems and Neural Networks , Machine Learning

Publish: 2025-05-11 17:38:40 UTC


#23 Outperformance Score: A Universal Standardization Method for Confusion-Matrix-Based Classification Performance Metrics [PDF] [Copy] [Kimi] [REL]

Authors: Ningsheng Zhao, Trang Bui, Jia Yuan Yu, Krzysztof Dzieciolowski

Many classification performance metrics exist, each suited to a specific application. However, these metrics often differ in scale and can exhibit varying sensitivity to class imbalance rates in the test set. As a result, it is difficult to use the nominal values of these metrics to interpret and evaluate classification performances, especially when imbalance rates vary. To address this problem, we introduce the outperformance score function, a universal standardization method for confusion-matrix-based classification performance (CMBCP) metrics. It maps any given metric to a common scale of [0,1], while providing a clear and consistent interpretation. Specifically, the outperformance score represents the percentile rank of the observed classification performance within a reference distribution of possible performances. This unified framework enables meaningful comparison and monitoring of classification performance across test sets with differing imbalance rates. We illustrate how the outperformance scores can be applied to a variety of commonly used classification performance metrics and demonstrate the robustness of our method through experiments on real-world datasets spanning multiple classification applications.

Subjects: Machine Learning , Machine Learning , Methodology

Publish: 2025-05-11 16:07:14 UTC


#24 Semiparametric Weighted Spline Regression (SWSR) in Confirmatory Clinical Trials with Time-Varying Placebo Effects [PDF] [Copy] [Kimi] [REL]

Authors: Tianyu Zhan, Yihua Gu

In confirmatory Phase 3 clinical trials with recruitment over the years, the underlying placebo effect may follow an unknown temporal trend. Taking a clinical trial on Hidradenitis Suppurativa (HS) as an example, fluctuations or variabilities are common in HS-related endpoints, mainly due to the natural disease characteristics, variations of evaluation from different physicians, and standard of care evolvement. The adjustment of time-varying placebo effects receives some attention in adaptive clinical trials and platform trials, but is usually ignored in traditional non-adaptive designs. However, under the impact of such a time drift, some existing methods may not simultaneously control the type I error rate and achieve satisfactory power. In this article, we propose SWSR (Semiparametric Weighted Spline Regression) to estimate the treatment effect with B-splines to accommodate the time-varying placebo effects nonparametrically. Our method aims to achieve the following three objectives: a proper type I error rate control under varying settings, an overall high power to detect a potential treatment effect, and robustness to unknown time-varying placebo effects. Simulation studies and a case study provide supporting evidence. Those three key features make SWSR an appealing option to be pre-specified for practical confirmatory clinical trials. Supplemental materials, including the R code, additional simulation results and theoretical discussion, are available online.

Subject: Methodology

Publish: 2025-05-11 11:05:36 UTC


#25 Accelerated inference for stochastic compartmental models with over-dispersed partial observations [PDF] [Copy] [Kimi] [REL]

Author: Michael Whitehouse

An assumed density approximate likelihood is derived for a class of partially observed stochastic compartmental models which permit observational over-dispersion. This is achieved by treating time-varying reporting probabilities as latent variables and integrating them out using Laplace approximations within Poisson Approximate Likelihoods (LawPAL), resulting in a fast deterministic approximation to the marginal likelihood and filtering distributions. We derive an asymptotically exact filtering result in the large population regime, demonstrating the approximation's ability to recover latent disease states and reporting probabilities. Through simulations we: 1) demonstrate favorable behavior of the maximum approximate likelihood estimator in the large population and time horizon regime in terms of ground truth recovery; 2) demonstrate order of magnitude computational speed gains over a sequential Monte Carlo likelihood based approach, and explore the statistical compromises our approximation implicitly makes. We conclude by embedding our methodology within the probabilistic programming language Stan for automated Bayesian inference to develop a model of practical interest using data from the Covid-19 outbreak in Switzerland.

Subject: Methodology

Publish: 2025-05-11 10:50:21 UTC