Statistics Theory

2024-10-22 | | Total: 13

#1 Nonparametric Bayesian networks are typically faithful in the total variation metric [PDF] [Copy] [Kimi] [REL]

Authors: Philip Boeken ; Patrick Forré ; Joris M. Mooij

We show that for a given DAG $G$, among all observational distributions of Bayesian networks over $G$ with arbitrary outcome spaces, the faithful distributions are `typical': they constitute a dense, open set with respect to the total variation metric. As a consequence, the set of faithful distributions is non-empty, and the unfaithful distributions are nowhere dense. We extend this result to the space of Bayesian networks, where the properties hold for Bayesian networks instead of distributions of Bayesian networks. As special cases, we show that these results also hold for the faithful parameters of the subclasses of linear Gaussian -- and discrete Bayesian networks, giving a topological analogue of the measure-zero results of Spirtes et al. (1993) and Meek (1995). Finally, we extend our topological results and the measure-zero results of Spirtes et al. and Meek to Bayesian networks with latent variables.

Subjects: Statistics Theory ; Probability ; Machine Learning

Publish: 2024-10-21 13:38:04 UTC

#2 The mutual arrangement of Wright-Fisher diffusion path measures and its impact on parameter estimation [PDF] [Copy] [Kimi] [REL]

Author: Paul A. Jenkins

The Wright-Fisher diffusion is a fundamentally important model of evolution encompassing genetic drift, mutation, and natural selection. Suppose you want to infer the parameters associated with these processes from an observed sample path. Then to write down the likelihood one first needs to know the mutual arrangement of two path measures under different parametrizations; that is, whether they are absolutely continuous, equivalent, singular, and so on. In this paper we give a complete answer to this question by finding the separating times for the diffusion - the stopping time before which one measure is absolutely continuous with respect to the other and after which the pair is mutually singular. In one dimension this extends a classical result of Dawson on the local equivalence between neutral and non-neutral Wright-Fisher diffusion measures. Along the way we also develop new zero-one type laws for the diffusion on its approach to, and emergence from, the boundary. As an application we derive an explicit expression for the joint maximum likelihood estimator of the mutation and selection parameters and show that its convergence properties are closely related to the separating time.

Subjects: Statistics Theory ; Probability ; Populations and Evolution

Publish: 2024-10-21 12:34:14 UTC

#3 Quantiles and Quantile Regression on Riemannian Manifolds: a measure-transportation-based approach [PDF] [Copy] [Kimi] [REL]

Authors: Marc Hallin ; Hang Liu

Increased attention has been given recently to the statistical analysis of variables with values on nonlinear manifolds. A natural but nontrivial problem in that context is the definition of quantile concepts. We are proposing a solution for compact Riemannian manifolds without boundaries; typical examples are polyspheres, hyperspheres, and toro\"ıdal manifolds equipped with their Riemannian metrics. Our concept of quantile function comes along with a concept of distribution function and, in the empirical case, ranks and signs. The absence of a canonical ordering is offset by resorting to the data-driven ordering induced by optimal transports. Theoretical properties, such as the uniform convergence of the empirical distribution and conditional (and unconditional) quantile functions and distribution-freeness of ranks and signs, are established. Statistical inference applications, from goodness-of-fit to distribution-free rank-based testing, are without number. Of particular importance is the case of quantile regression with directional or toro\"ıdal multiple output, which is given special attention in this paper. Extensive simulations are carried out to illustrate these novel concepts.

Subjects: Statistics Theory ; Geometric Topology ; Methodology

Publish: 2024-10-21 07:31:56 UTC

#4 Volatility estimation from a view point of entropy [PDF] [Copy] [Kimi] [REL]

Authors: Jirô Akahori ; Ryuya Namba ; Atsuhito Watanabe

In the present paper, we first revisit the volatility estimation approach proposed by N. Kunitomo and S. Sato, and second, we show that the volatility estimator proposed by P. Malliavin and M.E. Mancino can be understood in a unified way by the approach. Third, we introduce an alternative estimator that might overcome the inconsistency caused by the microstructure noise of the initial observation.

Subject: Statistics Theory

Publish: 2024-10-20 06:37:09 UTC

#5 Polyspectral Mean Estimation of General Nonlinear Processes [PDF] [Copy] [Kimi] [REL]

Authors: Dhrubajyoti Ghosh ; Tucker McElroy ; Soumendra Lahiri

Higher-order spectra (or polyspectra), defined as the Fourier Transform of a stationary process' autocumulants, are useful in the analysis of nonlinear and non Gaussian processes. Polyspectral means are weighted averages over Fourier frequencies of the polyspectra, and estimators can be constructed from analogous weighted averages of the higher-order periodogram (a statistic computed from the data sample's discrete Fourier Transform). We derive the asymptotic distribution of a class of polyspectral mean estimators, obtaining an exact expression for the limit distribution that depends on both the given weighting function as well as on higher-order spectra. Secondly, we use bispectral means to define a new test of the linear process hypothesis. Simulations document the finite sample properties of the asymptotic results. Two applications illustrate our results' utility: we test the linear process hypothesis for a Sunspot time series, and for the Gross Domestic Product we conduct a clustering exercise based on bispectral means with different weight functions.

Subject: Statistics Theory

Publish: 2024-10-19 19:37:22 UTC

#6 Joint Probability Estimation of Many Binary Outcomes via Localized Adversarial Lasso [PDF] [Copy] [Kimi] [REL]

Authors: Alexandre Belloni ; Yan Chen ; Matthew Harding

In this work we consider estimating the probability of many (possibly dependent) binary outcomes which is at the core of many applications, e.g., multi-level treatments in causal inference, demands for bundle of products, etc. Without further conditions, the probability distribution of an M dimensional binary vector is characterized by exponentially in M coefficients which can lead to a high-dimensional problem even without the presence of covariates. Understanding the (in)dependence structure allows us to substantially improve the estimation as it allows for an effective factorization of the probability distribution. In order to estimate the probability distribution of a M dimensional binary vector, we leverage a Bahadur representation that connects the sparsity of its coefficients with independence across the components. We propose to use regularized and adversarial regularized estimators to obtain an adaptive estimator with respect to the dependence structure which allows for rates of convergence to depend on this intrinsic (lower) dimension. These estimators are needed to handle several challenges within this setting, including estimating nuisance parameters, estimating covariates, and nonseparable moment conditions. Our main results consider the presence of (low dimensional) covariates for which we propose a locally penalized estimator. We provide pointwise rates of convergence addressing several issues in the theoretical analyses as we strive for making a computationally tractable formulation. We apply our results in the estimation of causal effects with multiple binary treatments and show how our estimators can improve the finite sample performance when compared with non-adaptive estimators that try to estimate all the probabilities directly. We also provide simulations that are consistent with our theoretical findings.

Subjects: Statistics Theory ; Methodology

Publish: 2024-10-19 17:35:12 UTC

#7 Implicit Regularization for Tubal Tensor Factorizations via Gradient Descent [PDF] [Copy] [Kimi] [REL]

Authors: Santhosh Karnik ; Anna Veselovska ; Mark Iwen ; Felix Krahmer

We provide a rigorous analysis of implicit regularization in an overparametrized tensor factorization problem beyond the lazy training regime. For matrix factorization problems, this phenomenon has been studied in a number of works. A particular challenge has been to design universal initialization strategies which provably lead to implicit regularization in gradient-descent methods. At the same time, it has been argued by Cohen et. al. 2016 that more general classes of neural networks can be captured by considering tensor factorizations. However, in the tensor case, implicit regularization has only been rigorously established for gradient flow or in the lazy training regime. In this paper, we prove the first tensor result of its kind for gradient descent rather than gradient flow. We focus on the tubal tensor product and the associated notion of low tubal rank, encouraged by the relevance of this model for image data. We establish that gradient descent in an overparametrized tensor factorization model with a small random initialization exhibits an implicit bias towards solutions of low tubal rank. Our theoretical findings are illustrated in an extensive set of numerical simulations show-casing the dynamics predicted by our theory as well as the crucial role of using a small random initialization.

Subjects: Machine Learning ; Optimization and Control ; Statistics Theory ; Machine Learning

Publish: 2024-10-21 17:52:01 UTC

#8 On the Geometry of Regularization in Adversarial Training: High-Dimensional Asymptotics and Generalization Bounds [PDF] [Copy] [Kimi] [REL]

Authors: Matteo Vilucchio ; Nikolaos Tsilivis ; Bruno Loureiro ; Julia Kempe

Regularization, whether explicit in terms of a penalty in the loss or implicit in the choice of algorithm, is a cornerstone of modern machine learning. Indeed, controlling the complexity of the model class is particularly important when data is scarce, noisy or contaminated, as it translates a statistical belief on the underlying structure of the data. This work investigates the question of how to choose the regularization norm $\lVert \cdot \rVert$ in the context of high-dimensional adversarial training for binary classification. To this end, we first derive an exact asymptotic description of the robust, regularized empirical risk minimizer for various types of adversarial attacks and regularization norms (including non-$\ell_p$ norms). We complement this analysis with a uniform convergence analysis, deriving bounds on the Rademacher Complexity for this class of problems. Leveraging our theoretical results, we quantitatively characterize the relationship between perturbation size and the optimal choice of $\lVert \cdot \rVert$, confirming the intuition that, in the data scarce regime, the type of regularization becomes increasingly important for adversarial training as perturbations grow in size.

Subjects: Machine Learning ; Disordered Systems and Neural Networks ; Machine Learning ; Statistics Theory

Publish: 2024-10-21 14:53:12 UTC

#9 On the VC dimension of deep group convolutional neural networks [PDF] [Copy] [Kimi] [REL]

Authors: Anna Sepliarskaia ; Sophie Langer ; Johannes Schmidt-Hieber

We study the generalization capabilities of Group Convolutional Neural Networks (GCNNs) with ReLU activation function by deriving upper and lower bounds for their Vapnik-Chervonenkis (VC) dimension. Specifically, we analyze how factors such as the number of layers, weights, and input dimension affect the VC dimension. We further compare the derived bounds to those known for other types of neural networks. Our findings extend previous results on the VC dimension of continuous GCNNs with two layers, thereby providing new insights into the generalization properties of GCNNs, particularly regarding the dependence on the input resolution of the data.

Subjects: Machine Learning ; Statistics Theory ; Machine Learning

Publish: 2024-10-21 09:16:06 UTC

#10 A note on the sparse Hanson-Wright inequality [PDF] [Copy] [Kimi] [REL]

Authors: Yiyun He ; Ke Wang ; Yizhe Zhu

We obtain Hanson-Wright inequalities for the quadratic form of a random vector with independent sparse random variables. Specifically, we consider cases where the components of the random vector are sparse $\alpha$-sub-exponential random variables with $\alpha>0$. Our proof relies on a novel combinatorial approach to estimate the moments of the random quadratic form.

Subjects: Probability ; Combinatorics ; Statistics Theory

Publish: 2024-10-21 05:26:59 UTC

#11 Distributionally Robust Instrumental Variables Estimation [PDF] [Copy] [Kimi] [REL]

Authors: Zhaonan Qu ; Yongchan Kwon

Instrumental variables (IV) estimation is a fundamental method in econometrics and statistics for estimating causal effects in the presence of unobserved confounding. However, challenges such as untestable model assumptions and poor finite sample properties have undermined its reliability in practice. Viewing common issues in IV estimation as distributional uncertainties, we propose DRIVE, a distributionally robust framework of the classical IV estimation method. When the ambiguity set is based on a Wasserstein distance, DRIVE minimizes a square root ridge regularized variant of the two stage least squares (TSLS) objective. We develop a novel asymptotic theory for this regularized regression estimator based on the square root ridge, showing that it achieves consistency without requiring the regularization parameter to vanish. This result follows from a fundamental property of the square root ridge, which we call ``delayed shrinkage''. This novel property, which also holds for a class of generalized method of moments (GMM) estimators, ensures that the estimator is robust to distributional uncertainties that persist in large samples. We further derive the asymptotic distribution of Wasserstein DRIVE and propose data-driven procedures to select the regularization parameter based on theoretical results. Simulation studies confirm the superior finite sample performance of Wasserstein DRIVE. Thanks to its regularization and robustness properties, Wasserstein DRIVE could be preferable in practice, particularly when the practitioner is uncertain about model assumptions or distributional shifts in data.

Subjects: Econometrics ; Optimization and Control ; Statistics Theory ; Machine Learning

Publish: 2024-10-21 04:33:38 UTC

#12 Simultaneous Inference in Multiple Matrix-Variate Graphs for High-Dimensional Neural Recordings [PDF] [Copy] [Kimi] [REL]

Authors: Zongge Liu ; Heejong Bong ; Zhao Ren ; Matthew A. Smith ; Robert E. Kass

As large-scale neural recordings become common, many neuroscientific investigations are focused on identifying functional connectivity from spatio-temporal measurements in two or more brain areas across multiple sessions. Spatial-temporal data in neural recordings can be represented as matrix-variate data, with time as the first dimension and space as the second. In this paper, we exploit the multiple matrix-variate Gaussian Graphical model to encode the common underlying spatial functional connectivity across multiple sessions of neural recordings. By effectively integrating information across multiple graphs, we develop a novel inferential framework that allows simultaneous testing to detect meaningful connectivity for a target edge subset of arbitrary size. Our test statistics are based on a group penalized regression approach and a high-dimensional Gaussian approximation technique. The validity of simultaneous testing is demonstrated theoretically under mild assumptions on sample size and non-stationary autoregressive temporal dependence. Our test is nearly optimal in achieving the testable region boundary. Additionally, our method involves only convex optimization and parametric bootstrap, making it computationally attractive. We demonstrate the efficacy of the new method through both simulations and an experimental study involving multiple local field potential (LFP) recordings in the Prefrontal Cortex (PFC) and visual area V4 during a memory-guided saccade task.

Subjects: Methodology ; Statistics Theory

Publish: 2024-10-20 22:50:02 UTC

#13 High-dimensional prediction for count response via sparse exponential weights [PDF] [Copy] [Kimi] [REL]

Author: The Tien Mai

Count data is prevalent in various fields like ecology, medical research, and genomics. In high-dimensional settings, where the number of features exceeds the sample size, feature selection becomes essential. While frequentist methods like Lasso have advanced in handling high-dimensional count data, Bayesian approaches remain under-explored with no theoretical results on prediction performance. This paper introduces a novel probabilistic machine learning framework for high-dimensional count data prediction. We propose a pseudo-Bayesian method that integrates a scaled Student prior to promote sparsity and uses an exponential weight aggregation procedure. A key contribution is a novel risk measure tailored to count data prediction, with theoretical guarantees for prediction risk using PAC-Bayesian bounds. Our results include non-asymptotic oracle inequalities, demonstrating rate-optimal prediction error without prior knowledge of sparsity. We implement this approach efficiently using Langevin Monte Carlo method. Simulations and a real data application highlight the strong performance of our method compared to the Lasso in various settings.

Subjects: Methodology ; Statistics Theory ; Machine Learning

Publish: 2024-10-20 12:45:42 UTC