Statistics

Date: Thu, 9 May 2024 | Total: 32

#1 Causality Pursuit from Heterogeneous Environments via Neural Adversarial Invariance Learning [PDF] [Copy] [Kimi2]

Authors: Yihong Gu ; Cong Fang ; Peter Bühlmann ; Jianqing Fan

Statistics suffers from a fundamental problem, "the curse of endogeneity" -- the regression function, or more broadly the prediction risk minimizer with infinite data, may not be the target we wish to pursue. This is because when complex data are collected from multiple sources, the biases deviated from the interested (causal) association inherited in individuals or sub-populations are not expected to be canceled. Traditional remedies are of hindsight and restrictive in being tailored to prior knowledge like untestable cause-effect structures, resulting in methods that risk model misspecification and lack scalable applicability. This paper seeks to offer a purely data-driven and universally applicable method that only uses the heterogeneity of the biases in the data rather than following pre-offered commandments. Such an idea is formulated as a nonparametric invariance pursuit problem, whose goal is to unveil the invariant conditional expectation $m^\star(x)\equiv \mathbb{E}[Y^{(e)}|X_{S^\star}^{(e)}=x_{S^\star}]$ with unknown important variable set $S^\star$ across heterogeneous environments $e\in \mathcal{E}$. Under the structural causal model framework, $m^\star$ can be interpreted as certain data-driven causality in general. The paper contributes to proposing a novel framework, called Focused Adversarial Invariance Regularization (FAIR), formulated as a single minimax optimization program that can solve the general invariance pursuit problem. As illustrated by the unified non-asymptotic analysis, our adversarial estimation framework can attain provable sample-efficient estimation akin to standard regression under a minimal identification condition for various tasks and models. As an application, the FAIR-NN estimator realized by two Neural Network classes is highlighted as the first approach to attain statistically efficient estimation in general nonparametric invariance learning.

#2 Robust deep learning from weakly dependent data [PDF1] [Copy] [Kimi]

Authors: William Kengne ; Modou Wade

Recent developments on deep learning established some theoretical properties of deep neural networks estimators. However, most of the existing works on this topic are restricted to bounded loss functions or (sub)-Gaussian or bounded input. This paper considers robust deep learning from weakly dependent observations, with unbounded loss function and unbounded input/output. It is only assumed that the output variable has a finite $r$ order moment, with $r >1$. Non asymptotic bounds for the expected excess risk of the deep neural network estimator are established under strong mixing, and $\psi$-weak dependence assumptions on the observations. We derive a relationship between these bounds and $r$, and when the data have moments of any order (that is $r=\infty$), the convergence rate is close to some well-known results. When the target predictor belongs to the class of H\"older smooth functions with sufficiently large smoothness index, the rate of the expected excess risk for exponentially strongly mixing data is close to or as same as those for obtained with i.i.d. samples. Application to robust nonparametric regression and robust nonparametric autoregression are considered. The simulation study for models with heavy-tailed errors shows that, robust estimators with absolute loss and Huber loss function outperform the least squares method.

#3 Differentially Private Synthetic Data with Private Density Estimation [PDF] [Copy] [Kimi]

Authors: Nikolija Bojkovic ; Po-Ling Loh

The need to analyze sensitive data, such as medical records or financial data, has created a critical research challenge in recent years. In this paper, we adopt the framework of differential privacy, and explore mechanisms for generating an entire dataset which accurately captures characteristics of the original data. We build upon the work of Boedihardjo et al, which laid the foundations for a new optimization-based algorithm for generating private synthetic data. Importantly, we adapt their algorithm by replacing a uniform sampling step with a private distribution estimator; this allows us to obtain better computational guarantees for discrete distributions, and develop a novel algorithm suitable for continuous distributions. We also explore applications of our work to several statistical tasks.

#4 Fast Decentralized Gradient Tracking for Federated Minimax Optimization with Local Updates [PDF] [Copy] [Kimi]

Author: Chris Junchi Li

Federated learning (FL) for minimax optimization has emerged as a powerful paradigm for training models across distributed nodes/clients while preserving data privacy and model robustness on data heterogeneity. In this work, we delve into the decentralized implementation of federated minimax optimization by proposing \texttt{K-GT-Minimax}, a novel decentralized minimax optimization algorithm that combines local updates and gradient tracking techniques. Our analysis showcases the algorithm's communication efficiency and convergence rate for nonconvex-strongly-concave (NC-SC) minimax optimization, demonstrating a superior convergence rate compared to existing methods. \texttt{K-GT-Minimax}'s ability to handle data heterogeneity and ensure robustness underscores its significance in advancing federated learning research and applications.

#5 Wasserstein Proximal Coordinate Gradient Algorithms [PDF] [Copy] [Kimi]

Authors: Rentian Yao ; Xiaohui Chen ; Yun Yang

Motivated by approximation Bayesian computation using mean-field variational approximation and the computation of equilibrium in multi-species systems with cross-interaction, this paper investigates the composite geodesically convex optimization problem over multiple distributions. The objective functional under consideration is composed of a convex potential energy on a product of Wasserstein spaces and a sum of convex self-interaction and internal energies associated with each distribution. To efficiently solve this problem, we introduce the Wasserstein Proximal Coordinate Gradient (WPCG) algorithms with parallel, sequential and random update schemes. Under a quadratic growth (QC) condition that is weaker than the usual strong convexity requirement on the objective functional, we show that WPCG converges exponentially fast to the unique global optimum. In the absence of the QG condition, WPCG is still demonstrated to converge to the global optimal solution, albeit at a slower polynomial rate. Numerical results for both motivating examples are consistent with our theoretical findings.

#6 Data-driven Error Estimation: Upper Bounding Multiple Errors with No Technical Debt [PDF] [Copy] [Kimi]

Authors: Sanath Kumar Krishnamurthy ; Susan Athey ; Emma Brunskill

We formulate the problem of constructing multiple simultaneously valid confidence intervals (CIs) as estimating a high probability upper bound on the maximum error for a class/set of estimate-estimand-error tuples, and refer to this as the error estimation problem. For a single such tuple, data-driven confidence intervals can often be used to bound the error in our estimate. However, for a class of estimate-estimand-error tuples, nontrivial high probability upper bounds on the maximum error often require class complexity as input -- limiting the practicality of such methods and often resulting in loose bounds. Rather than deriving theoretical class complexity-based bounds, we propose a completely data-driven approach to estimate an upper bound on the maximum error. The simple and general nature of our solution to this fundamental challenge lends itself to several applications including: multiple CI construction, multiple hypothesis testing, estimating excess risk bounds (a fundamental measure of uncertainty in machine learning) for any training/fine-tuning algorithm, and enabling the development of a contextual bandit pipeline that can leverage any reward model estimation procedure as input (without additional mathematical analysis).

#7 Community detection in multi-layer bipartite networks [PDF] [Copy] [Kimi]

Author: Huan Qing

The problem of community detection in multi-layer undirected networks has received considerable attention in recent years. However, practical scenarios often involve multi-layer bipartite networks, where each layer consists of two distinct types of nodes. Existing community detection algorithms tailored for multi-layer undirected networks are not directly applicable to multi-layer bipartite networks. To address this challenge, this paper introduces a novel multi-layer degree-corrected stochastic co-block model specifically designed to capture the underlying community structure within multi-layer bipartite networks. Within this framework, we propose an efficient debiased spectral co-clustering algorithm for detecting nodes' communities. We establish the consistent estimation property of our proposed algorithm and demonstrate that an increased number of layers in bipartite networks improves the accuracy of community detection. Through extensive numerical experiments, we showcase the superior performance of our algorithm compared to existing methods. Additionally, we validate our algorithm by applying it to real-world multi-layer network datasets, yielding meaningful and insightful results.

#8 Persistent homology of featured time series data and its applications [PDF] [Copy] [Kimi]

Authors: Eunwoo Heo ; Jae-Hun Jung

Recent studies have actively employed persistent homology (PH), a topological data analysis technique, to analyze the topological information in time series data. Many successful studies have utilized graph representations of time series data for PH calculation. Given the diverse nature of time series data, it is crucial to have mechanisms that can adjust the PH calculations by incorporating domain-specific knowledge. In this context, we introduce a methodology that allows the adjustment of PH calculations by reflecting relevant domain knowledge in specific fields. We introduce the concept of featured time series, which is the pair of a time series augmented with specific features such as domain knowledge, and an influence vector that assigns a value to each feature to fine-tune the results of the PH. We then prove the stability theorem of the proposed method, which states that adjusting the influence vectors grants stability to the PH calculations. The proposed approach enables the tailored analysis of a time series based on the graph representation methodology, which makes it applicable to real-world domains. We consider two examples to verify the proposed method's advantages: anomaly detection of stock data and topological analysis of music data.

#9 A general error analysis for randomized low-rank approximation with application to data assimilation [PDF] [Copy] [Kimi]

Authors: Alexandre Scotto Di Perrotolo ; Youssef Diouane ; Selime Gürol ; Xavier Vasseur

Randomized algorithms have proven to perform well on a large class of numerical linear algebra problems. Their theoretical analysis is critical to provide guarantees on their behaviour, and in this sense, the stochastic analysis of the randomized low-rank approximation error plays a central role. Indeed, several randomized methods for the approximation of dominant eigen- or singular modes can be rewritten as low-rank approximation methods. However, despite the large variety of algorithms, the existing theoretical frameworks for their analysis rely on a specific structure for the covariance matrix that is not adapted to all the algorithms. We propose a general framework for the stochastic analysis of the low-rank approximation error in Frobenius norm for centered and non-standard Gaussian matrices. Under minimal assumptions on the covariance matrix, we derive accurate bounds both in expectation and probability. Our bounds have clear interpretations that enable us to derive properties and motivate practical choices for the covariance matrix resulting in efficient low-rank approximation algorithms. The most commonly used bounds in the literature have been demonstrated as a specific instance of the bounds proposed here, with the additional contribution of being tighter. Numerical experiments related to data assimilation further illustrate that exploiting the problem structure to select the covariance matrix improves the performance as suggested by our bounds.

#10 Testing the Fairness-Improvability of Algorithms [PDF] [Copy] [Kimi]

Authors: Eric Auerbach ; Annie Liang ; Max Tabord-Meehan ; Kyohei Okumura

Many algorithms have a disparate impact in that their benefits or harms fall disproportionately on certain social groups. Addressing an algorithm's disparate impact can be challenging, however, because it is not always clear whether there exists an alternative more-fair algorithm that does not compromise on other key objectives such as accuracy or profit. Establishing the improvability of algorithms with respect to multiple criteria is of both conceptual and practical interest: in many settings, disparate impact that would otherwise be prohibited under US federal law is permissible if it is necessary to achieve a legitimate business interest. The question is how a policy maker can formally substantiate, or refute, this necessity defense. In this paper, we provide an econometric framework for testing the hypothesis that it is possible to improve on the fairness of an algorithm without compromising on other pre-specified objectives. Our proposed test is simple to implement and can incorporate any exogenous constraint on the algorithm space. We establish the large-sample validity and consistency of our test, and demonstrate its use empirically by evaluating a healthcare algorithm originally considered by Obermeyer et al. (2019). In this demonstration, we find strong statistically significant evidence that it is possible to reduce the algorithm's disparate impact without compromising on the accuracy of its predictions.

#11 Learning with Posterior Sampling for Revenue Management under Time-varying Demand [PDF] [Copy] [Kimi]

Authors: Kazuma Shimizu ; Junya Honda ; Shinji Ito ; Shinji Nakadai

This paper discusses the revenue management (RM) problem to maximize revenue by pricing items or services. One challenge in this problem is that the demand distribution is unknown and varies over time in real applications such as airline and retail industries. In particular, the time-varying demand has not been well studied under scenarios of unknown demand due to the difficulty of jointly managing the remaining inventory and estimating the demand. To tackle this challenge, we first introduce an episodic generalization of the RM problem motivated by typical application scenarios. We then propose a computationally efficient algorithm based on posterior sampling, which effectively optimizes prices by solving linear programming. We derive a Bayesian regret upper bound of this algorithm for general models where demand parameters can be correlated between time periods, while also deriving a regret lower bound for generic algorithms. Our empirical study shows that the proposed algorithm performs better than other benchmark algorithms and comparably to the optimal policy in hindsight. We also propose a heuristic modification of the proposed algorithm, which further efficiently learns the pricing policy in the experiments.

#12 Multi-fidelity Hamiltonian Monte Carlo [PDF] [Copy] [Kimi]

Authors: Dhruv V. Patel ; Jonghyun Lee ; Matthew W. Farthing ; Peter K. Kitanidis ; Eric F. Darve

Numerous applications in biology, statistics, science, and engineering require generating samples from high-dimensional probability distributions. In recent years, the Hamiltonian Monte Carlo (HMC) method has emerged as a state-of-the-art Markov chain Monte Carlo technique, exploiting the shape of such high-dimensional target distributions to efficiently generate samples. Despite its impressive empirical success and increasing popularity, its wide-scale adoption remains limited due to the high computational cost of gradient calculation. Moreover, applying this method is impossible when the gradient of the posterior cannot be computed (for example, with black-box simulators). To overcome these challenges, we propose a novel two-stage Hamiltonian Monte Carlo algorithm with a surrogate model. In this multi-fidelity algorithm, the acceptance probability is computed in the first stage via a standard HMC proposal using an inexpensive differentiable surrogate model, and if the proposal is accepted, the posterior is evaluated in the second stage using the high-fidelity (HF) numerical solver. Splitting the standard HMC algorithm into these two stages allows for approximating the gradient of the posterior efficiently, while producing accurate posterior samples by using HF numerical solvers in the second stage. We demonstrate the effectiveness of this algorithm for a range of problems, including linear and nonlinear Bayesian inverse problems with in-silico data and experimental data. The proposed algorithm is shown to seamlessly integrate with various low-fidelity and HF models, priors, and datasets. Remarkably, our proposed method outperforms the traditional HMC algorithm in both computational and statistical efficiency by several orders of magnitude, all while retaining or improving the accuracy in computed posterior statistics.

#13 Biology-inspired joint distribution neurons based on Hierarchical Correlation Reconstruction allowing for multidirectional neural networks [PDF] [Copy] [Kimi]

Author: Jarek Duda

Popular artificial neural networks (ANN) optimize parameters for unidirectional value propagation, assuming some guessed parametrization type like Multi-Layer Perceptron (MLP) or Kolmogorov-Arnold Network (KAN). In contrast, for biological neurons e.g. "it is not uncommon for axonal propagation of action potentials to happen in both directions" \cite{axon} - suggesting they are optimized to continuously operate in multidirectional way. Additionally, statistical dependencies a single neuron could model is not just (expected) value dependence, but entire joint distributions including also higher moments. Such agnostic joint distribution neuron would allow for multidirectional propagation (of distributions or values) e.g. $\rho(x|y,z)$ or $\rho(y,z|x)$ by substituting to $\rho(x,y,z)$ and normalizing. There will be discussed Hierarchical Correlation Reconstruction (HCR) for such neuron model: assuming $\rho(x,y,z)=\sum_{ijk} a_{ijk} f_i(x) f_j(y) f_k(z)$ type parametrization of joint distribution with polynomial basis $f_i$, which allows for flexible, inexpensive processing including nonlinearities, direct model estimation and update, trained through standard backpropagation or novel ways for such structure up to tensor decomposition. Using only pairwise (input-output) dependencies, its expected value prediction becomes KAN-like with trained activation functions as polynomials, can be extended by adding higher order dependencies through included products - in conscious interpretable way, allowing for multidirectional propagation of both values and probability densities.

#14 Some variation of COBRA in sequential learning setup [PDF] [Copy] [Kimi]

Authors: Aryan Bhambu ; Arabin Kumar Dey

This research paper introduces innovative approaches for multivariate time series forecasting based on different variations of the combined regression strategy. We use specific data preprocessing techniques which makes a radical change in the behaviour of prediction. We compare the performance of the model based on two types of hyper-parameter tuning Bayesian optimisation (BO) and Usual Grid search. Our proposed methodologies outperform all state-of-the-art comparative models. We illustrate the methodologies through eight time series datasets from three categories: cryptocurrency, stock index, and short-term load forecasting.

#15 Adaptive design of experiments methodology for noise resistance with unreplicated experiments [PDF] [Copy] [Kimi]

Authors: Lucas Caparini ; Gwynn J. Elfring ; Mauricio Ponga

A new gradient-based adaptive sampling method is proposed for design of experiments applications which balances space filling, local refinement, and error minimization objectives while reducing reliance on delicate tuning parameters. High order local maximum entropy approximants are used for metamodelling, which take advantage of boundary-corrected kernel density estimation to increase accuracy and robustness on highly clumped datasets, as well as conferring the resulting metamodel with some robustness against data noise in the common case of unreplicated experiments. Two-dimensional test cases are analyzed against full factorial and latin hypercube designs and compare favourably. The proposed method is then applied in a unique manner to the problem of adaptive spatial resolution in time-varying non-linear functions, opening up the possibility to adapt the method to solve partial differential equations.

#16 Inference With Combining Rules From Multiple Differentially Private Synthetic Datasets [PDF] [Copy] [Kimi]

Authors: Leila Nombo ; Anne-Sophie Charest

Differential privacy (DP) has been accepted as a rigorous criterion for measuring the privacy protection offered by random mechanisms used to obtain statistics or, as we will study here, synthetic datasets from confidential data. Methods to generate such datasets are increasingly numerous, using varied tools including Bayesian models, deep neural networks and copulas. However, little is still known about how to properly perform statistical inference with these differentially private synthetic (DIPS) datasets. The challenge is for the analyses to take into account the variability from the synthetic data generation in addition to the usual sampling variability. A similar challenge also occurs when missing data is imputed before analysis, and statisticians have developed appropriate inference procedures for this case, which we tend extended to the case of synthetic datasets for privacy. In this work, we study the applicability of these procedures, based on combining rules, to the analysis of DIPS datasets. Our empirical experiments show that the proposed combining rules may offer accurate inference in certain contexts, but not in all cases.

#17 Weighted Particle-Based Optimization for Efficient Generalized Posterior Calibration [PDF] [Copy] [Kimi]

Author: Masahiro Tanaka

In the realm of statistical learning, the increasing volume of accessible data and increasing model complexity necessitate robust methodologies. This paper explores two branches of robust Bayesian methods in response to this trend. The first is generalized Bayesian inference, which introduces a learning rate parameter to enhance robustness against model misspecifications. The second is Gibbs posterior inference, which formulates inferential problems using generic loss functions rather than probabilistic models. In such approaches, it is necessary to calibrate the spread of the posterior distribution by selecting a learning rate parameter. The study aims to enhance the generalized posterior calibration (GPC) algorithm proposed by Syring and Martin (2019) [Biometrika, Volume 106, Issue 2, pp. 479-486]. Their algorithm chooses the learning rate to achieve the nominal frequentist coverage probability, but it is computationally intensive because it requires repeated posterior simulations for bootstrap samples. We propose a more efficient version of the GPC inspired by sequential Monte Carlo (SMC) samplers. A target distribution with a different learning rate is evaluated without posterior simulation as in the reweighting step in SMC sampling. Thus, the proposed algorithm can reach the desired value within a few iterations. This improvement substantially reduces the computational cost of the GPC. Its efficacy is demonstrated through synthetic and real data applications.

#18 On Correlation and Prediction Interval Reduction [PDF] [Copy] [Kimi]

Authors: Romain Piaget-Rossel ; Valentin Rousson

Pearson's correlation coefficient is a popular statistical measure to summarize the strength of association between two continuous variables. It is usually interpreted via its square as percentage of variance of one variable predicted by the other in a linear regression model. It can be generalized for multiple regression via the coefficient of determination, which is not straightforward to interpret in terms of prediction accuracy. In this paper, we propose to assess the prediction accuracy of a linear model via the prediction interval reduction (PIR) by comparing the width of the prediction interval derived from this model with the width of the prediction interval obtained without this model. At the population level, PIR is one-to-one related to the correlation and the coefficient of determination. In particular, a correlation of 0.5 corresponds to a PIR of only 13%. It is also the one's complement of the coefficient of alienation introduced at the beginning of last century. We argue that PIR is easily interpretable and useful to keep in mind how difficult it is to make accurate individual predictions, an important message in the era of precision medicine and artificial intelligence. Different estimates of PIR are compared in the context of a linear model and an extension of the PIR concept to non-linear models is outlined.

#19 Dependence-based fuzzy clustering of functional time series [PDF] [Copy] [Kimi]

Authors: Angel Lopez-Oriona ; Ying Sun ; Han Lin Shang

Time series clustering is an important data mining task with a wide variety of applications. While most methods focus on time series taking values on the real line, very few works consider functional time series. However, functional objects frequently arise in many fields, such as actuarial science, demography or finance. Functional time series are indexed collections of infinite-dimensional curves viewed as random elements taking values in a Hilbert space. In this paper, the problem of clustering functional time series is addressed. To this aim, a distance between functional time series is introduced and used to construct a clustering procedure. The metric relies on a measure of serial dependence which can be seen as a natural extension of the classical quantile autocorrelation function to the functional setting. Since the dynamics of the series may vary over time, we adopt a fuzzy approach, which enables the procedure to locate each series into several clusters with different membership degrees. The resulting algorithm can group series generated from similar stochastic processes, reaching accurate results with series coming from a broad variety of functional models and requiring minimum hyperparameter tuning. Several simulation experiments show that the method exhibits a high clustering accuracy besides being computationally efficient. Two interesting applications involving high-frequency financial time series and age-specific mortality improvement rates illustrate the potential of the proposed approach.

#20 Guiding adaptive shrinkage by co-data to improve regression-based prediction and feature selection [PDF] [Copy] [Kimi]

Authors: Mark A. van de Wiel ; Wessel N. van Wieringen

The high dimensional nature of genomics data complicates feature selection, in particular in low sample size studies - not uncommon in clinical prediction settings. It is widely recognized that complementary data on the features, `co-data', may improve results. Examples are prior feature groups or p-values from a related study. Such co-data are ubiquitous in genomics settings due to the availability of public repositories. Yet, the uptake of learning methods that structurally use such co-data is limited. We review guided adaptive shrinkage methods: a class of regression-based learners that use co-data to adapt the shrinkage parameters, crucial for the performance of those learners. We discuss technical aspects, but also the applicability in terms of types of co-data that can be handled. This class of methods is contrasted with several others. In particular, group-adaptive shrinkage is compared with the better-known sparse group-lasso by evaluating feature selection. Finally, we demonstrate the versatility of the guided shrinkage methodology by showing how to `do-it-yourself': we integrate implementations of a co-data learner and the spike-and-slab prior for the purpose of improving feature selection in genetics studies.

#21 Fast Computation of Leave-One-Out Cross-Validation for $k$-NN Regression [PDF] [Copy] [Kimi]

Author: Motonobu Kanagawa

We describe a fast computation method for leave-one-out cross-validation (LOOCV) for $k$-nearest neighbours ($k$-NN) regression. We show that, under a tie-breaking condition for nearest neighbours, the LOOCV estimate of the mean square error for $k$-NN regression is identical to the mean square error of $(k+1)$-NN regression evaluated on the training data, multiplied by the scaling factor $(k+1)^2/k^2$. Therefore, to compute the LOOCV score, one only needs to fit $(k+1)$-NN regression only once, and does not need to repeat training-validation of $k$-NN regression for the number of training data. Numerical experiments confirm the validity of the fast computation method.

#22 A joint model for DHS and MICS surveys: Spatial modeling with anonymized locations [PDF] [Copy] [Kimi]

Authors: John Paige ; Geir-Arne Fuglstad ; Andrea Riebler

Anonymizing the GPS locations of observations can bias a spatial model's parameter estimates and attenuate spatial predictions when improperly accounted for, and is relevant in applications from public health to paleoseismology. In this work, we demonstrate that a newly introduced method for geostatistical modeling in the presence of anonymized point locations can be extended to account for more general kinds of positional uncertainty due to location anonymization, including both jittering (a form of random perturbations of GPS coordinates) and geomasking (reporting only the name of the area containing the true GPS coordinates). We further provide a numerical integration scheme that flexibly accounts for the positional uncertainty as well as spatial and covariate information. We apply the method to women's secondary education completion data in the 2018 Nigeria demographic and health survey (NDHS) containing jittered point locations, and the 2016 Nigeria multiple indicator cluster survey (NMICS) containing geomasked locations. We show that accounting for the positional uncertainty in the surveys can improve predictions in terms of their continuous rank probability score.

#23 Learning Structural Causal Models through Deep Generative Models: Methods, Guarantees, and Challenges [PDF] [Copy] [Kimi]

Authors: Audrey Poinsot ; Alessandro Leite ; Nicolas Chesneau ; Michèle Sébag ; Marc Schoenauer

This paper provides a comprehensive review of deep structural causal models (DSCMs), particularly focusing on their ability to answer counterfactual queries using observational data within known causal structures. It delves into the characteristics of DSCMs by analyzing the hypotheses, guarantees, and applications inherent to the underlying deep learning components and structural causal models, fostering a finer understanding of their capabilities and limitations in addressing different counterfactual queries. Furthermore, it highlights the challenges and open questions in the field of deep structural causal modeling. It sets the stages for researchers to identify future work directions and for practitioners to get an overview in order to find out the most appropriate methods for their needs.

#24 gasmodel: An R Package for Generalized Autoregressive Score Models [PDF] [Copy] [Kimi]

Author: Vladimír Holý

Generalized autoregressive score (GAS) models are a class of observation-driven time series models that employ the score to dynamically update time-varying parameters of the underlying probability distribution. GAS models have been extensively studied and numerous variants have been proposed in the literature to accommodate diverse data types and probability distributions. This paper introduces the gasmodel package, which has been designed to facilitate the estimation, forecasting, and simulation of a wide range of GAS models. The package provides a rich selection of distributions, offers flexible options for specifying dynamics, and allows to incorporate exogenous variables. Model estimation utilizes the maximum likelihood method.

#25 Uncertainty quantification in metric spaces [PDF] [Copy] [Kimi]

Authors: Gábor Lugosi ; Marcos Matabuena

This paper introduces a novel uncertainty quantification framework for regression models where the response takes values in a separable metric space, and the predictors are in a Euclidean space. The proposed algorithms can efficiently handle large datasets and are agnostic to the predictive base model used. Furthermore, the algorithms possess asymptotic consistency guarantees and, in some special homoscedastic cases, we provide non-asymptotic guarantees. To illustrate the effectiveness of the proposed uncertainty quantification framework, we use a linear regression model for metric responses (known as the global Fr\'echet model) in various clinical applications related to precision and digital medicine. The different clinical outcomes analyzed are represented as complex statistical objects, including multivariate Euclidean data, Laplacian graphs, and probability distributions.