Processing math: 100%

Methodology

2025-07-17 | | Total: 21

#1 Does K-fold CV based penalty perform variable selection or does it lead to n1/2-consistency in Lasso? [PDF] [Copy] [Kimi] [REL]

Authors: Mayukh Choudhury, Debraj Das

Least absolute shrinkage and selection operator or Lasso, introduced by Tibshirani (1996), is one of the widely used regularization methods in regression. It is observed that the properties of Lasso vary wildly depending on the choice of the penalty parameter. The recent results of Lahiri (2021) suggest that, depending on the nature of the penalty parameter, Lasso can either be variable selection consistent or be n1/2consistent. However, practitioners generally implement Lasso by choosing the penalty parameter in a data-dependent way, the most popular being the K-fold cross-validation. In this paper, we explore the variable selection consistency and n1/2consistency of Lasso when the penalty is chosen based on K-fold cross-validation with K being fixed. We consider the fixed-dimensional heteroscedastic linear regression model and show that Lasso with K-fold cross-validation based penalty is n1/2consistent, but not variable selection consistent. We also establish the n1/2consistency of the K-fold cross-validation based penalty as an intermediate result. Additionally, as a consequence of n1/2consistency, we establish the validity of Bootstrap to approximate the distribution of the Lasso estimator based on Kfold cross-validation. We validate the Bootstrap approximation in finite samples based on a moderate simulation study. Thus, our results essentially justify the use of K-fold cross-validation in practice to draw inferences based on n1/2scaled pivotal quantities in Lasso regression.

Subject: Methodology

Publish: 2025-07-16 17:56:44 UTC


#2 Assessing the Impact of Covariate Distribution and Positivity Violation on Weighting-Based Indirect Comparisons: a Simulation Study [PDF] [Copy] [Kimi] [REL]

Authors: Arnaud Serret-Larmande, Jérôme Lambert, Stéphane Gaudry, David Hajage

Population-Adjusted Indirect Comparisons (PAICs) are used to estimate treatment effects when direct comparisons are infeasible and individual patient data (IPD) are only available for one trial. Among PAIC methods, Matching-Adjusted Indirect Comparison (MAIC) is the most widely used. However, little is known about how MAIC performs under challenging conditions such as limited covariate overlap or markedly non-normal covariate distributions. We conducted a Monte Carlo simulation study comparing three estimators: (i) MAIC matching first moment (MAIC-1), (ii) MAIC matching first and second moments (MAIC-2), and (iii) a benchmark method leveraging full IPD -- Propensity Score Weighting (PSW). We examined eight scenarios ranging from ideal conditions to situations with positivity violations and non-normal (including bimodal) covariate distributions. We assessed both anchored and unanchored estimators and examined the impact of adjustment model misspecification. We also applied these estimators to real-world data from the AKIKI and AKIKI-2 trials, comparing renal replacement therapy strategies in critically ill patients. MAIC-1 demonstrated robust performance, remaining unbiased in the presence of moderate positivity violations and non-normal covariates, while MAIC-2 and PSW appeared more sensitive to positivity violations. All methods showed substantial bias when key confounders were omitted, emphasizing the importance of correct model specification. In real-world data, a consistent trend was found with MAIC-1 showing narrower confidence intervals with positivity violation. Our findings support the cautious use of unanchored MAICs and highlight MAIC-1's resilience across moderate violations of assumptions. However, the method's limited flexibility underscores the need for careful use in real-world settings.

Subjects: Methodology , Applications

Publish: 2025-07-16 13:53:01 UTC


#3 Overcoming Standardization: Revealing Hidden Age Patterns of Suicide with Spatiotemporal Models [PDF] [Copy] [Kimi] [REL]

Authors: J. Martín-Pozuelo, A. López-Quílez, X. Barber, M. Marco

Indirect standardization is widely used in disease mapping to control for confounding, but relies on restrictive assumptions that may bias estimates if violated. Using data on suicide-related emergency calls, this study highlights such limitations and proposes age-structured hierarchical Bayesian models as an alternative. These models incorporate space-time, space-age, and time-age interactions, allowing for more accurate estimation without strong assumptions. The results show improved model fit, especially when including age effects. The best model reveals a rising temporal trend (2017--2022), a nonlinear age pattern, and stronger risk increases among younger individuals compared to older ones.

Subject: Methodology

Publish: 2025-07-16 08:49:50 UTC


#4 Bootstrap prediction intervals for the age distribution of life-table death counts [PDF] [Copy] [Kimi] [REL]

Author: Han Lin Shang

We introduce a nonparametric bootstrap procedure based on a dynamic factor model to construct pointwise prediction intervals for period life-table death counts. The age distribution of death counts is an example of constrained data, which are nonnegative and have a constrained integral. A centered log-ratio transformation is used to remove the constraints. With a time series of unconstrained data, we introduce our bootstrap method to construct prediction intervals, thereby quantifying forecast uncertainty. The bootstrap method utilizes a dynamic factor model to capture both nonstationary and stationary patterns through a two-stage functional principal component analysis. To capture parameter uncertainty, the estimated principal component scores and model residuals are sampled with replacement. Using the age- and sex-specific life-table deaths for Australia and the United Kingdom, we study the empirical coverage probabilities and compare them with the nominal ones. The bootstrap method has superior interval forecast accuracy, especially for the one-step-ahead forecast horizon.

Subjects: Methodology , Applications

Publish: 2025-07-16 06:26:46 UTC


#5 Regularized k-POD: Sparse k-means clustering for high-dimensional missing data [PDF] [Copy] [Kimi] [REL]

Authors: Xin Guan, Yoshikazu Terada

The classical k-means clustering, based on distances computed from all data features, cannot be directly applied to incomplete data with missing values. A natural extension of k-means to missing data, namely k-POD, uses only the observed entries for clustering and is both computationally efficient and flexible. However, for high-dimensional missing data including features irrelevant to the underlying cluster structure, the presence of such irrelevant features leads to the bias of k-POD in estimating cluster centers, thereby damaging its clustering effect. Nevertheless, the existing k-POD method performs well in low-dimensional cases, highlighting the importance of addressing the bias issue. To this end, in this paper, we propose a regularized k-POD clustering method that applies feature-wise regularization on cluster centers into the existing k-POD clustering. Such a penalty on cluster centers enables us to effectively reduce the bias of k-POD for high-dimensional missing data. To the best of our knowledge, our method is the first to mitigate bias in k-means-type clustering for high-dimensional missing data, while retaining the computational efficiency and flexibility. Simulation results verify that the proposed method effectively reduces bias and improves clustering performance. Applications to real-world single-cell RNA sequencing data further show the utility of the proposed method.

Subject: Methodology

Publish: 2025-07-16 03:50:20 UTC


#6 Bias reduction method for prior event rate ratio, with application to emergency department visit rates in patients with advanced cancer [PDF] [Copy] [Kimi] [REL]

Authors: Xiangmei Ma, Chetna Malhotra, Eric Andrew Finkelstein, Yin Bun Cheung

Objectives: Prior event rate ratio (PERR) is a promising approach to control confounding in observational and real-world evidence research. One of its assumptions is that occurrence of outcome events does not influence later event rate, or in other words, absence of 'event dependence'. This study proposes, evaluates and illustrates a bias reduction method when this assumption is violated. Study Design and Setting: We propose the conditional frailty method for implementation of PERR in the presence of event dependence and evaluate its performance by simulation. We demonstrate the use of the method with a study of emergency department visit rate and palliative care in patients with advanced cancer in Singapore. Results: Simulations showed that, in the presence of negative (positive) event dependence, the crude PERR estimate of treatment effect was biased towards (away from) the null value. The proposed method successfully reduced the bias, with median of absolute level of relative bias at about 5%. Dynamic random-intercept modelling revealed positive event dependence in emergency department visits among patients with advanced cancer. While conventional time-to-event regression analysis with covariate adjustment estimated higher rate of emergency department visits among palliative care recipients (HR=3.61, P<0.001), crude PERR estimate and the proposed PERR estimate were 1.45 (P=0.22) and 1.22 (P=0.57), respectively. Conclusions: The proposed bias reduction method mitigates the impact of violation of the PERR assumption of absence of event dependence. It allows broader application of the PERR approach.

Subjects: Methodology , Applications

Publish: 2025-07-16 03:14:09 UTC


#7 R2 priors for Grouped Variance Decomposition in High-dimensional Regression [PDF] [Copy] [Kimi] [REL]

Authors: Javier Enrique Aguilar, David Kohns, Aki Vehtari, Paul-Christian Bürkner

We introduce the Group-R2 decomposition prior, a hierarchical shrinkage prior that extends R2-based priors to structured regression settings with known groups of predictors. By decomposing the prior distribution of the coefficient of determination R2 in two stages, first across groups, then within groups, the prior enables interpretable control over model complexity and sparsity. We derive theoretical properties of the prior, including marginal distributions of coefficients, tail behavior, and connections to effective model complexity. Through simulation studies, we evaluate the conditions under which grouping improves predictive performance and parameter recovery compared to priors that do not account for groups. Our results provide practical guidance for prior specification and highlight both the strengths and limitations of incorporating grouping into R2-based shrinkage priors.

Subjects: Methodology , Applications , Other Statistics

Publish: 2025-07-16 01:40:56 UTC


#8 A Relativity-Based Framework for Statistical Testing Guided by the Independence of Ancillary Statistics: Methodology and Nonparametric Illustrations [PDF] [Copy] [Kimi] [REL]

Authors: Albert Vexler, Douglas Landsittel

This paper introduces a decision-theoretic framework for constructing and evaluating test statistics based on their relationship with ancillary statistics-quantities whose distributions remain fixed under the null and alternative hypotheses. Rather than focusing solely on maximizing discriminatory power, the proposed approach emphasizes reducing dependence between a test statistic and relevant ancillary structures. We show that minimizing such dependence can yield most powerful (MP) procedures. A Basu-type independence result is established, and we demonstrate that certain MP statistics also characterize the underlying data distribution. The methodology is illustrated through modifications of classical nonparametric tests, including the Shapiro-Wilk, Anderson-Darling, and Kolmogorov-Smirnov tests, as well as a test for the center of symmetry. Simulation studies highlight the power and robustness of the proposed procedures. The framework is computationally simple and offers a principled strategy for improving statistical testing.

Subjects: Methodology , Statistics Theory

Publish: 2025-07-16 00:27:12 UTC


#9 Bayesian multivariate models for bounded directional data [PDF] [Copy] [Kimi] [REL]

Authors: Joel Montesinos-Vazquez, Gabriel Núñez-Antonio

In some areas of knowledge there are data representing directions restricted to a specific range of values. Consequently, it is useful to have models for describing variables defined in subsets of the k-dimensional unit sphere. This need has led to the development of models such as the multivariate projected Gamma distribution. However, the proposal of multivariate models whose marginal variables are defined only in sections of the unit circle and with a flexible dependency structure is limited. In this work, we propose constructing multivariate models where each marginal variable is a circular variable defined only in the first quadrant of the unit circle. Our approach is based on the concept of copula functions. The inferences for the proposed models rely on generating samples of the posterior joint density of all parameters involved in the models. This is achieved by applying a conditional approach that allows inferences to be made using a two-stage sampling. The proposed methodology is illustrated with both simulated and real data.

Subjects: Methodology , Computation

Publish: 2025-07-15 22:54:26 UTC


#10 Fiducial Matching: Differentially Private Inference for Categorical Data [PDF] [Copy] [Kimi] [REL]

Authors: Ogonnaya Michael Romanus, Younes Boulaguiem, Roberto Molinari

The task of statistical inference, which includes the building of confidence intervals and tests for parameters and effects of interest to a researcher, is still an open area of investigation in a differentially private (DP) setting. Indeed, in addition to the randomness due to data sampling, DP delivers another source of randomness consisting of the noise added to protect an individual's data from being disclosed to a potential attacker. As a result of this convolution of noises, in many cases it is too complicated to determine the stochastic behavior of the statistics and parameters resulting from a DP procedure. In this work, we contribute to this line of investigation by employing a simulation-based matching approach, solved through tools from the fiducial framework, which aims to replicate the data generation pipeline (including the DP step) and retrieve an approximate distribution of the estimates resulting from this pipeline. For this purpose, we focus on the analysis of categorical (nominal) data that is common in national surveys, for which sensitivity is naturally defined, and on additive privacy mechanisms. We prove the validity of the proposed approach in terms of coverage and highlight its good computational and statistical performance for different inferential tasks in simulated and applied data settings.

Subjects: Methodology , Computation , Machine Learning

Publish: 2025-07-15 21:56:15 UTC


#11 Smooth tensor decomposition with application to ambulatory blood pressure monitoring data [PDF1] [Copy] [Kimi] [REL]

Authors: Leyuan Qian, R. Nisha Aurora, Naresh M. Punjabi, Irina Gaynanova

Ambulatory blood pressure monitoring (ABPM) enables continuous measurement of blood pressure and heart rate over 24 hours and is increasingly used in clinical studies. However, ABPM data are often reduced to summary statistics, such as means or medians, which obscure temporal features like nocturnal dipping and individual chronotypes. Functional data analysis methods better capture these temporal dynamics but typically treat each ABPM measurement separately, limiting their ability to leverage correlations among matched measurements. In this work, we observe that aligning ABPM data along measurement type, time, and patient ID lends itself to a tensor representation--a multidimensional array. Although tensor learning has shown great potential in other fields, it has not been applied to ABPM data. Existing tensor learning approaches often lack temporal smoothing constraints, assume no missing data, and can be computationally demanding. To address these limitations, we propose a novel smooth tensor decomposition method that incorporates a temporal smoothing penalty and accommodates missing data. We also develop an automatic procedure for selecting the optimal smoothing parameter and tensor ranks. Simulation studies demonstrate that our method reliably reconstructs smooth temporal trends from noisy, incomplete data. Application to ABPM data from patients with concurrent obstructive sleep apnea and type 2 diabetes uncovers clinically relevant associations between patient characteristics and ABPM measurements, which are missed by summary-based approaches.

Subject: Methodology

Publish: 2025-07-15 20:47:16 UTC


#12 Model averaging in the space of probability distributions [PDF1] [Copy] [Kimi1] [REL]

Authors: Emmanouil Androulakis, Georgios I. Papayiannis, Athanasios N. Yannacopoulos

This work investigates the problem of model averaging in the context of measure-valued data. Specifically, we study aggregation schemes in the space of probability distributions metrized in terms of the Wasserstein distance. The resulting aggregate models, defined via Wasserstein barycenters, are optimally calibrated to empirical data. To enhance model performance, we employ regularization schemes motivated by the standard elastic net penalization, which is shown to consistently yield models enjoying sparsity properties. The consistency properties of the proposed averaging schemes with respect to sample size are rigorously established using the variational framework of Γ-convergence. The performance of the methods is evaluated through carefully designed synthetic experiments that assess behavior across a range of distributional characteristics and stress conditions. Finally, the proposed approach is applied to a real-world dataset of insurance losses - characterized by heavy-tailed behavior - to estimate the claim size distribution and the associated tail risk.

Subjects: Methodology , Computation , Machine Learning

Publish: 2025-07-15 20:41:57 UTC


#13 Bayesian wavelet shrinkage for low SNR data based on the Epanechnikov kernel [PDF] [Copy] [Kimi] [REL]

Authors: Fidel Aniano Causil Barrios, Alex Rodrigo dos Santos Sousa

Consider the univariate nonparametric regression model with additive Gaussian noise and the representation of the unknown regression function in terms of a wavelet basis. We propose a shrinkage rule to estimate the wavelet coefficients obtained by mixing a point mass function at zero with the Epanechnikov distribution as a prior for the coefficients. The proposed rule proved to be suitable for application in scenarios with low signal-to-noise ratio datasets and outperformed standard and Bayesian methods in simulation studies. Statistical properties, such as squared bias and variance, are provided, and an explicit expression of the rule is obtained. An application of the rule is demonstrated using a real EEG dataset.

Subject: Methodology

Publish: 2025-07-15 20:38:37 UTC


#14 Constructing targeted minimum loss/maximum likelihood estimators: a simple illustration to build intuition [PDF] [Copy] [Kimi] [REL]

Authors: Rachael K. Ross, Lina M. Montoya, Dana E. Goin, Ivan Diaz, Audrey Renson

Use of machine learning to estimate nuisance functions (e.g. outcomes models, propensity score models) in estimators used in causal inference is increasingly common, as it can mitigate bias due to model misspecification. However, it can be challenging to achieve valid inference (e.g., estimate valid confidence intervals). The efficient influence function (EIF) provides a recipe to go from a statistical estimand relevant to our causal question, to an estimator that can validly incorporate machine learning. Our companion paper, Renson et al. 2025 (arXiv:2502.05363), provides a thorough but approachable description of the EIF, along with a guide through the steps to go from a unique statistical estimand to development of one type of EIF-based estimator, the so-called one-step estimator. Another commonly used estimator based on the EIF is the targeted maximum likelihood/minimum loss estimator (TMLE). Construction of TMLEs is well-discussed in the statistical literature, but there remains a gap in translation to a more applied audience. In this letter, which supplements Renson et al., we provide a more accessible illustration of how to construct a TMLE.

Subject: Methodology

Publish: 2025-07-15 19:36:03 UTC


#15 Forecasting sub-population mortality using credibility theory [PDF] [Copy] [Kimi] [REL]

Authors: Mathias Lindholm, Gabriele Pittarello

The focus of the present paper is to forecast mortality rates for small sub-populations that are parts of a larger super-population. In this setting the assumption is that it is possible to produce reliable forecasts for the super-population, but the sub-populations may be too small or lack sufficient history to produce reliable forecasts if modelled separately. This setup is aligned with the ideas that underpin credibility theory, and in the present paper the classical credibility theory approach is extended to be able to handle the situation where future mortality rates are driven by a latent stochastic process, as is the case for, e.g., Lee-Carter type models. This results in sub-population credibility predictors that are weighted averages of expected future super-population mortality rates and expected future sub-population specific mortality rates. Due to the predictor's simple structure it is possible to derive an explicit expression for the mean squared error of prediction. Moreover, the proposed credibility modelling approach does not depend on the specific form of the super-population model, making it broadly applicable regardless of the chosen forecasting model for the super-population. The performance of the suggested sub-population credibility predictor is illustrated on simulated population data. These illustrations highlight how the credibility predictor serves as a compromise between only using a super-population model, and only using a potentially unreliable sub-population specific model.

Subjects: Applications , Methodology

Publish: 2025-07-16 15:23:09 UTC


#16 A Framework for Nonstationary Gaussian Processes with Neural Network Parameters [PDF] [Copy] [Kimi1] [REL]

Authors: Zachary James, Joseph Guinness

Gaussian processes have become a popular tool for nonparametric regression because of their flexibility and uncertainty quantification. However, they often use stationary kernels, which limit the expressiveness of the model and may be unsuitable for many datasets. We propose a framework that uses nonstationary kernels whose parameters vary across the feature space, modeling these parameters as the output of a neural network that takes the features as input. The neural network and Gaussian process are trained jointly using the chain rule to calculate derivatives. Our method clearly describes the behavior of the nonstationary parameters and is compatible with approximation methods for scaling to large datasets. It is flexible and easily adapts to different nonstationary kernels without needing to redesign the optimization procedure. Our methods are implemented with the GPyTorch library and can be readily modified. We test a nonstationary variance and noise variant of our method on several machine learning datasets and find that it achieves better accuracy and log-score than both a stationary model and a hierarchical model approximated with variational inference. Similar results are observed for a model with only nonstationary variance. We also demonstrate our approach's ability to recover the nonstationary parameters of a spatial dataset.

Subjects: Machine Learning , Artificial Intelligence , Methodology , Machine Learning

Publish: 2025-07-16 14:09:49 UTC


#17 Fast Variational Bayes for Large Spatial Data [PDF1] [Copy] [Kimi1] [REL]

Authors: Jiafang Song, Abhirup Datta

Recent variational Bayes methods for geospatial regression, proposed as an alternative to computationally expensive Markov chain Monte Carlo (MCMC) sampling, have leveraged Nearest Neighbor Gaussian processes (NNGP) to achieve scalability. Yet, these variational methods remain inferior in accuracy and speed compared to spNNGP, the state-of-the-art MCMC-based software for NNGP. We introduce spVarBayes, a suite of fast variational Bayesian approaches for large-scale geospatial data analysis using NNGP. Our contributions are primarily computational. We replace auto-differentiation with a combination of calculus of variations, closed-form gradient updates, and linear response corrections for improved variance estimation. We also accommodate covariates (fixed effects) in the model and offer inference on the variance parameters. Simulation experiments demonstrate that we achieve comparable accuracy to spNNGP but with reduced computational costs, and considerably outperform existing variational inference methods in terms of both accuracy and speed. Analysis of a large forest canopy height dataset illustrates the practical implementation of proposed methods and shows that the inference results are consistent with those obtained from the MCMC approach. The proposed methods are implemented in publicly available Github R-package spVarBayes.

Subjects: Computation , Methodology , Machine Learning

Publish: 2025-07-16 13:59:27 UTC


#18 Data Synchronization at High Frequencies [PDF1] [Copy] [Kimi] [REL]

Authors: Xinbing Kong, Cheng Liu, Bin Wu

Asynchronous trading in high-frequency financial markets introduces significant biases into econometric analysis, distorting risk estimates and leading to suboptimal portfolio decisions. Existing synchronization methods, such as the previous-tick approach, suffer from information loss and create artificial price staleness. We introduce a novel framework that recasts the data synchronization challenge as a constrained matrix completion problem. Our approach recovers the potential matrix of high-frequency price increments by minimizing its nuclear norm -- capturing the underlying low-rank factor structure -- subject to a large-scale linear system derived from observed, asynchronous price changes. Theoretically, we prove the existence and uniqueness of our estimator and establish its convergence rate. A key theoretical insight is that our method accurately and robustly leverages information from both frequently and infrequently traded assets, overcoming a critical difficulty of efficiency loss in traditional methods. Empirically, using extensive simulations and a large panel of S&P 500 stocks, we demonstrate that our method substantially outperforms established benchmarks. It not only achieves significantly lower synchronization errors, but also corrects the bias in systematic risk estimates (i.e., eigenvalues) and the estimate of betas caused by stale prices. Crucially, portfolios constructed using our synchronized data yield consistently and economically significant higher out-of-sample Sharpe ratios. Our framework provides a powerful tool for uncovering the true dynamics of asset prices, with direct implications for high-frequency risk management, algorithmic trading, and econometric inference.

Subjects: Econometrics , Methodology

Publish: 2025-07-16 13:25:50 UTC


#19 Enhancing Signal Proportion Estimation Through Leveraging Arbitrary Covariance Structures [PDF] [Copy] [Kimi] [REL]

Authors: Jingtian Bai, Xinge Jessie Jeng

Accurately estimating the proportion of true signals among a large number of variables is crucial for enhancing the precision and reliability of scientific research. Traditional signal proportion estimators often assume independence among variables and specific signal sparsity conditions, limiting their applicability in real-world scenarios where such assumptions may not hold. This paper introduces a novel signal proportion estimator that leverages arbitrary covariance dependence information among variables, thereby improving performance across a wide range of sparsity levels and dependence structures. Building on previous work that provides lower confidence bounds for signal proportions, we extend this approach by incorporating the principal factor approximation procedure to account for variable dependence. Our theoretical insights offer a deeper understanding of how signal sparsity, signal intensity, and covariance dependence interact. By comparing the conditions for estimation consistency before and after dependence adjustment, we highlight the advantages of integrating dependence information across different contexts. This theoretical foundation not only validates the effectiveness of the new estimator but also guides its practical application, ensuring reliable use in diverse scenarios. Through extensive simulations, we demonstrate that our method outperforms state-of-the-art estimators in both estimation accuracy and the detection of weaker signals that might otherwise go undetected.

Subjects: Statistics Theory , Methodology , Machine Learning

Publish: 2025-07-16 05:37:42 UTC


#20 Newfluence: Boosting Model interpretability and Understanding in High Dimensions [PDF1] [Copy] [Kimi] [REL]

Authors: Haolin Zou, Arnab Auddy, Yongchan Kwon, Kamiar Rahnama Rad, Arian Maleki

The increasing complexity of machine learning (ML) and artificial intelligence (AI) models has created a pressing need for tools that help scientists, engineers, and policymakers interpret and refine model decisions and predictions. Influence functions, originating from robust statistics, have emerged as a popular approach for this purpose. However, the heuristic foundations of influence functions rely on low-dimensional assumptions where the number of parameters p is much smaller than the number of observations n. In contrast, modern AI models often operate in high-dimensional regimes with large p, challenging these assumptions. In this paper, we examine the accuracy of influence functions in high-dimensional settings. Our theoretical and empirical analyses reveal that influence functions cannot reliably fulfill their intended purpose. We then introduce an alternative approximation, called Newfluence, that maintains similar computational efficiency while offering significantly improved accuracy. Newfluence is expected to provide more accurate insights than many existing methods for interpreting complex AI models and diagnosing their issues. Moreover, the high-dimensional framework we develop in this paper can also be applied to analyze other popular techniques, such as Shapley values.

Subjects: Machine Learning , Machine Learning , Methodology

Publish: 2025-07-16 04:22:16 UTC


#21 Inference on Optimal Policy Values and Other Irregular Functionals via Smoothing [PDF1] [Copy] [Kimi] [REL]

Authors: Justin Whitehouse, Morgane Austern, Vasilis Syrgkanis

Constructing confidence intervals for the value of an optimal treatment policy is an important problem in causal inference. Insight into the optimal policy value can guide the development of reward-maximizing, individualized treatment regimes. However, because the functional that defines the optimal value is non-differentiable, standard semi-parametric approaches for performing inference fail to be directly applicable. Existing approaches for handling this non-differentiability fall roughly into two camps. In one camp are estimators based on constructing smooth approximations of the optimal value. These approaches are computationally lightweight, but typically place unrealistic parametric assumptions on outcome regressions. In another camp are approaches that directly de-bias the non-smooth objective. These approaches don't place parametric assumptions on nuisance functions, but they either require the computation of intractably-many nuisance estimates, assume unrealistic L nuisance convergence rates, or make strong margin assumptions that prohibit non-response to a treatment. In this paper, we revisit the problem of constructing smooth approximations of non-differentiable functionals. By carefully controlling first-order bias and second-order remainders, we show that a softmax smoothing-based estimator can be used to estimate parameters that are specified as a maximum of scores involving nuisance components. In particular, this includes the value of the optimal treatment policy as a special case. Our estimator obtains n convergence rates, avoids parametric restrictions/unrealistic margin assumptions, and is often statistically efficient.

Subjects: Econometrics , Machine Learning , Statistics Theory , Methodology

Publish: 2025-07-15 22:38:39 UTC