Computation

Date: Thu, 9 May 2024 | Total: 4

#1 Fast Exact/Conservative Monte Carlo Confidence Intervals [PDF] [Copy] [Kimi]

Authors: Amanda K. Glazer ; Philip B. Stark

Monte Carlo tests about parameters can be "inverted" to form confidence sets: the confidence set comprises all hypothesized values of the parameter that are not rejected at level $\alpha$. When the tests are exact or conservative -- as some families of such tests are -- so are the confidence sets. Because the validity of confidence sets depends only on the significance level of the test of the true null, every null can be tested using the same Monte Carlo sample, substantially reducing the computational burden of constructing confidence sets: the computation count is $O(n)$, where $n$ is the number of data. The Monte Carlo sample can be arbitrarily small, although the highest nontrivial attainable confidence level generally increases as the number of Monte Carlo replicates increases. When the parameter is real-valued and the $P$-value is quasiconcave in that parameter, it is straightforward to find the endpoints of the confidence interval using bisection in a conservative way. For some test statistics, values for different simulations and parameter values have a simple relationship that make more savings possible. An open-source Python implementation of the approach for the one-sample and two-sample problems is available.

#2 gasmodel: An R Package for Generalized Autoregressive Score Models [PDF] [Copy] [Kimi]

Author: Vladimír Holý

Generalized autoregressive score (GAS) models are a class of observation-driven time series models that employ the score to dynamically update time-varying parameters of the underlying probability distribution. GAS models have been extensively studied and numerous variants have been proposed in the literature to accommodate diverse data types and probability distributions. This paper introduces the gasmodel package, which has been designed to facilitate the estimation, forecasting, and simulation of a wide range of GAS models. The package provides a rich selection of distributions, offers flexible options for specifying dynamics, and allows to incorporate exogenous variables. Model estimation utilizes the maximum likelihood method.

#3 Fast Computation of Leave-One-Out Cross-Validation for $k$-NN Regression [PDF] [Copy] [Kimi]

Author: Motonobu Kanagawa

We describe a fast computation method for leave-one-out cross-validation (LOOCV) for $k$-nearest neighbours ($k$-NN) regression. We show that, under a tie-breaking condition for nearest neighbours, the LOOCV estimate of the mean square error for $k$-NN regression is identical to the mean square error of $(k+1)$-NN regression evaluated on the training data, multiplied by the scaling factor $(k+1)^2/k^2$. Therefore, to compute the LOOCV score, one only needs to fit $(k+1)$-NN regression only once, and does not need to repeat training-validation of $k$-NN regression for the number of training data. Numerical experiments confirm the validity of the fast computation method.

#4 Weighted Particle-Based Optimization for Efficient Generalized Posterior Calibration [PDF] [Copy] [Kimi]

Author: Masahiro Tanaka

In the realm of statistical learning, the increasing volume of accessible data and increasing model complexity necessitate robust methodologies. This paper explores two branches of robust Bayesian methods in response to this trend. The first is generalized Bayesian inference, which introduces a learning rate parameter to enhance robustness against model misspecifications. The second is Gibbs posterior inference, which formulates inferential problems using generic loss functions rather than probabilistic models. In such approaches, it is necessary to calibrate the spread of the posterior distribution by selecting a learning rate parameter. The study aims to enhance the generalized posterior calibration (GPC) algorithm proposed by Syring and Martin (2019) [Biometrika, Volume 106, Issue 2, pp. 479-486]. Their algorithm chooses the learning rate to achieve the nominal frequentist coverage probability, but it is computationally intensive because it requires repeated posterior simulations for bootstrap samples. We propose a more efficient version of the GPC inspired by sequential Monte Carlo (SMC) samplers. A target distribution with a different learning rate is evaluated without posterior simulation as in the reweighting step in SMC sampling. Thus, the proposed algorithm can reach the desired value within a few iterations. This improvement substantially reduces the computational cost of the GPC. Its efficacy is demonstrated through synthetic and real data applications.