Methodology

2025-11-17 | | Total: 11

#1 A Recursive Theory of Variational State Estimation: The Dynamic Programming Approach [PDF] [Copy] [Kimi] [REL]

Author: Filip Tronarp

In this article, variational state estimation is examined from the dynamic programming perspective. This leads to two different value functional recursions depending on whether backward or forward dynamic programming is employed. The result is a theory of variational state estimation that corresponds to the classical theory of Bayesian state estimation. More specifically, in the backward method, the value functional corresponds to a likelihood that is upper bounded by the state likelihood from the Bayesian backward recursion. In the forward method, the value functional corresponds to an unnormalized density that is upper bounded by the unnormalized filtering density. Both methods can be combined to arrive at a variational two-filter formula. Additionally, it is noted that optimal variational filtering is generally of quadratic time-complexity in the sequence length. This motivates the notion of sub-optimal variational filtering, which also lower bounds the evidence but is of linear time-complexity. Another problem is the fact that the value functional recursions are generally intractable. This is briefly discussed and a simple approximation is suggested that retrieves the filter proposed by Courts et al. (2021). The methodology is examined in a jump Gauss--Markov system, where it is observed that the value functional recursions are tractable under a certain factored Markov process approximation. A simulation study demonstrates that the posterior approximation is of adequate quality.

Subject: Methodology

Publish: 2025-11-14 17:19:44 UTC


#2 Estimating the Effects of Heatwaves on Health: A Causal Inference Framework [PDF] [Copy] [Kimi] [REL]

Authors: Giulio Grossi, Leo Vanciu, Veronica Ballerini, Danielle Braun, Falco J. Bargagli Stoffi

The harmful relationship between heatwaves and health has been extensively documented in medical and epidemiological literature. However, most evidence is associational and cannot be interpreted causally unless strong assumptions are made. In this paper, we first make explicit the assumptions underlying the statistical methods frequently used in the heatwave literature and demonstrate when these assumptions might break down in heatwave contexts. To address these shortcomings, we propose a causal inference framework that transparently elicits causal identification assumptions. Within this new framework, we first introduce synthetic controls (SC) for estimating heatwave effects, then propose a spatially augmented Bayesian synthetic control (SA-SC) method that accounts for spatial dependence and spillovers. Empirical Monte Carlo simulations show both methods perform well, with SA-SC reducing root mean squared error and improving posterior interval coverage under spillovers and spatial dependence. Finally, we apply the proposed methods to estimate the causal effects of heatwaves on Medicare heat-related hospitalizations among 13,753,273 beneficiaries residing in Northeastern U.S. from 2000 to 2019. This causal inference framework provides spatially coherent counterfactual outcomes and robust, interpretable, and transparent causal estimates while explicitly addressing the unexamined assumptions in existing methods that pervade the heatwave effect literature.

Subjects: Methodology , Applications

Publish: 2025-11-14 16:01:49 UTC


#3 Model Class Selection [PDF] [Copy] [Kimi] [REL]

Authors: Ryan Cecil, Lucas Mentch

Classical model selection seeks to find a single model within a particular class that optimizes some pre-specified criteria, such as maximizing a likelihood or minimizing a risk. More recently, there has been an increased interest in model set selection (MSS), where the aim is to identify a (confidence) set of near-optimal models. Here, we generalize the MSS framework further by introducing the idea of model class selection (MCS). In MCS, multiple model collections are evaluated, and all collections that contain at least one optimal model are sought for identification. Under mild conditions, data splitting based approaches are shown to provide general solutions for MCS. As a direct consequence, for particular datasets we are able to investigate formally whether classes of simpler and more interpretable statistical models are able to perform on par with more complex black-box machine learning models. A variety of simulated and real-data experiments are provided.

Subjects: Methodology , Machine Learning

Publish: 2025-11-14 14:43:26 UTC


#4 Interpolated stochastic interventions based on propensity scores, target policies and treatment-specific costs [PDF] [Copy] [Kimi] [REL]

Author: Johan de Aguas

We introduce families of stochastic interventions for discrete treatments that connect causal modeling to cost-sensitive decision making. The interventions arise from a cost-penalized information projection of the independent product of the organic propensity and a user-specified target, yielding closed-form Boltzmann-Gibbs couplings. The induced marginals define modified stochastic policies that interpolate smoothly, via a single tilt parameter, from the organic law or from the target distribution toward a product-of-experts limit when all destination costs are strictly positive. One of these families recovers and extends incremental propensity score interventions, retaining identification without global positivity. For inference, we derive efficient influence functions under a nonparametric model for the expected outcomes after these policies and construct one-step estimators with uniform confidence bands. In simulations, the proposed estimators improve stability and robustness to nuisance misspecification relative to plug-in baselines. The framework can operationalize graded scientific hypotheses under realistic constraints: because inputs are modular, analysts can sweep feasible policy spaces, prototype candidates, and align interventions with budgets and logistics before committing experimental resources. This could help close the loop between observational evidence and resource-aware experimental design.

Subject: Methodology

Publish: 2025-11-14 14:38:34 UTC


#5 Extreme-PLS with missing data under weak dependence [PDF] [Copy] [Kimi] [REL]

Authors: Stéphane Girard, Cambyse Pakzad

This paper develops a theoretical framework for Extreme Partial Least Squares (EPLS) dimension reduction in the presence of missing data and weak temporal dependence. Building upon the recent EPLS methodology for modeling extremal dependence between a response variable and high-dimensional covariates, we extend the approach to more realistic data settings where both serial correlation and missing-ness occur. Specifically, we consider a single-index inverse regression model under heavy-tailed conditions and introduce a Missing-at-Random (MAR) mechanism acting on the covariates, whose probability depends on the extremeness of the response. The asymptotic behavior of the proposed estimator is established within an alpha-mixing framework, leading to consistency results under regularly varying tails. Extensive Monte-Carlo experiments covering eleven dependence schemes (including ARMA, GARCH, and nonlinear ESTAR processes) demonstrate that the method performs robustly across a wide range of heavy-tailed and dependent scenarios, even when substantial portions of data are missing. A real-world application to environmental data further confirms the method's capacity to recover meaningful tail directions.

Subjects: Methodology , Statistics Theory

Publish: 2025-11-14 14:23:02 UTC


#6 Online Spectral Density Estimation [PDF] [Copy] [Kimi] [REL]

Authors: Shahriar Hasnat Kazi, Niall Adams, Edward A. K. Cohen

This paper develops the first online algorithms for estimating the spectral density function -- a fundamental object of interest in time series analysis -- that satisfies the three core requirements of streaming inference: fixed memory, fixed computational complexity, and temporal adaptivity. Our method builds on the concept of forgetting factors, allowing the estimator to adapt to gradual or abrupt changes in the data-generating process without prior knowledge of its dynamics. We introduce a novel online forgetting-factor periodogram and show that, under stationarity, it asymptotically recovers the properties of its offline counterpart. Leveraging this, we construct an online Whittle estimator, and further develop an adaptive online spectral estimator that dynamically tunes its forgetting factor using the Whittle likelihood as a loss. Through extensive simulation studies and an application to ocean drifter velocity data, we demonstrate the method's ability to track time-varying spectral properties in real-time with strong empirical performance.

Subject: Methodology

Publish: 2025-11-14 13:30:58 UTC


#7 Knockoffs for low dimensions: changing the nominal level post-hoc to gain power while controlling the FDR [PDF] [Copy] [Kimi] [REL]

Authors: Lasse Fischer, Konstantinos Sechidis

Knockoffs are a powerful tool for controlled variable selection with false discovery rate (FDR) control. However, while they are frequently used in high-dimensional regressions, they lack power in low-dimensional and sparse signal settings. One of the main reasons is that knockoffs require a minimum number of selections, depending on the nominal FDR level. In this paper, we leverage e-values to allow the nominal level to be switched after looking at the data and applying the knockoff procedure. In this way, we can increase the nominal level in cases where the original knockoff procedure does not make any selections to potentially make discoveries. Also, in cases where the original knockoff procedure makes discoveries, we can often decrease the nominal level to increase the precision. These improvements come without any costs, meaning the results of our post-hoc knockoff procedure are always more informative than the results of the original knockoff procedure. Furthermore, we apply our technique to recently proposed derandomized knockoff procedures.

Subject: Methodology

Publish: 2025-11-14 11:05:37 UTC


#8 Multivariate longitudinal modeling of cross-sectional and lagged associations between a continuous time-varying endogenous covariate and a non-Gaussian outcome [PDF] [Copy] [Kimi] [REL]

Authors: Chiara Degan, Bart J. A. Mertens, Jelle Goeman, Nadine A. Ikelaar, Erik H. Niks, Pietro Spitali, Roula Tsonaka

In longitudinal studies, time-varying covariates are often endogenous, meaning their values depend on both their own history and that of the outcome variable. This violates key assumptions of Generalized Linear Mixed Effects Models (GLMMs), leading to biased and inconsistent estimates. Additionally, missing data and non-concurrent measurements between covariates and outcomes further complicate analysis, especially in rare or degenerative diseases where data is limited. To address these challenges, we propose an alternative use of two well-known multivariate models, each assuming a different form of the association. One induces the association by jointly modeling the random effects, called Joint Mixed Model (JMM); the other quantifies the association using a scaling factor, called Joint Scaled Model (JSM). We extend these models to accommodate continuous endogenous covariates and a wide range of longitudinal outcome types. A limitation in both cases is that the interpretation of the association is neither straightforward nor easy to communicate to scientists. Hence, we have numerically derived an association coefficient that measures the marginal relation between the outcome and the endogenous covariate. The proposed method provides interpretable, population-level estimates of cross-sectional associations (capturing relationships between covariates and outcomes measured at the same time point) and lagged associations (quantifying how past covariate values influence future outcomes), enabling clearer clinical insights. We fitted the JMM and JSM using a flexible Bayesian estimation approach, known as Integrated Nested Laplace Approximation (INLA), to overcome computation burden problems. These models will be presented along with the results of a simulation study and a natural history study on patients with Duchenne Muscular Dystrophy.

Subjects: Methodology , Applications

Publish: 2025-11-14 09:43:39 UTC


#9 Improving Variance and Confidence Interval Estimation in Small-Sample Propensity Score Analyses: Bootstrap vs. Asymptotic Methods [PDF] [Copy] [Kimi] [REL]

Authors: Baoshan Zhang, Sean M. O'Brien, Yuan Wu, Laine E. Thomas

Propensity score (PS) methods are widely used to estimate treatment effects in non-randomized studies. Variance is typically estimated using sandwich or bootstrap methods, which can either treat the PS as estimated or fixed. The latter is thought to be conservative. Comparisons between the sandwich and bootstrap estimators have been compared in moderate to large sample sizes, favoring the bootstrap estimator. With the growing interest in treatments for rare disease and externally controlled clinical trials, very small sample sizes are not uncommon and the asymptotic properties of sandwich estimators may not hold. Bootstrap methods that allow for PS re-estimation can also generate problems with quasi-separation in small samples. It is unclear whether it is safe to prefer sandwich estimators or to assume that treating the PS as fixed is conservative. We conducted a Monte Carlo simulation to compare the performance of bootstrap versus sandwich variance and CI estimators for average treatment effects estimated with PS methods. We systematically evaluated the impact of treating the PS as fixed versus re-estimating it. These methodological comparisons were performed using Inverse Probability of Treatment Weighting (IPTW) and Augmented Inverse Probability of Treatment Weighting (AIPW) estimators. Simulations assessed performance under various conditions, including small sample sizes and different outcome and treatment prevalences. We illustrate the differences in our motivating example, the LIMIT-JIA trial. We show that the sandwich estimators can perform quite poorly in small samples, and fixed PS methods are not necessarily conservative. A stratified bootstrap avoids quasi-separation and performs well. Differences were large enough to alter statistical conclusions in our motivating example, LIMIT-JIA.

Subject: Methodology

Publish: 2025-11-14 02:47:02 UTC


#10 Neighborhood Stability in Double/Debiased Machine Learning with Dependent Data [PDF] [Copy] [Kimi] [REL]

Authors: Jianfei Cao, Michael P. Leung

This paper studies double/debiased machine learning (DML) methods applied to weakly dependent data. We allow observations to be situated in a general metric space that accommodates spatial and network data. Existing work implements cross-fitting by excluding from the training fold observations sufficiently close to the evaluation fold. We find in simulations that this can result in exceedingly small training fold sizes, particularly with network data. We therefore seek to establish the validity of DML without cross-fitting, building on recent work by Chen et al. (2022). They study i.i.d. data and require the machine learner to satisfy a natural stability condition requiring insensitivity to data perturbations that resample a single observation. We extend these results to dependent data by strengthening stability to "neighborhood stability," which requires insensitivity to resampling observations in any slowly growing neighborhood. We show that existing results on the stability of various machine learners can be adapted to verify neighborhood stability.

Subjects: Econometrics , Methodology

Publish: 2025-11-14 06:34:14 UTC


#11 Online Price Competition under Generalized Linear Demands [PDF] [Copy] [Kimi] [REL]

Authors: Daniele Bracale, Moulinath Banerjee, Cong Shi, Yuekai Sun

We study sequential price competition among $N$ sellers, each influenced by the pricing decisions of their rivals. Specifically, the demand function for each seller $i$ follows the single index model $λ_i(\mathbf{p}) = μ_i(\langle \boldsymbolθ_{i,0}, \mathbf{p} \rangle)$, with known increasing link $μ_i$ and unknown parameter $\boldsymbolθ_{i,0}$, where the vector $\mathbf{p}$ denotes the vector of prices offered by all the sellers simultaneously at a given instant. Each seller observes only their own realized demand -- unobservable to competitors -- and the prices set by rivals. Our framework generalizes existing approaches that focus solely on linear demand models. We propose a novel decentralized policy, PML-GLUCB, that combines penalized MLE with an upper-confidence pricing rule, removing the need for coordinated exploration phases across sellers -- which is integral to previous linear models -- and accommodating both binary and real-valued demand observations. Relative to a dynamic benchmark policy, each seller achieves $O(N^{2}\sqrt{T}\log(T))$ regret, which essentially matches the optimal rate known in the linear setting. A significant technical contribution of our work is the development of a variant of the elliptical potential lemma -- typically applied in single-agent systems -- adapted to our competitive multi-agent environment.

Subjects: Computer Science and Game Theory , Statistics Theory , Methodology

Publish: 2025-11-13 18:06:21 UTC