Statistics Theory

2025-01-17 | | Total: 6

#1 Generative Models with ELBOs Converging to Entropy Sums [PDF4] [Copy] [Kimi2] [REL]

Authors: Jan Warnken, Dmytro Velychko, Simon Damm, Asja Fischer, Jörg Lücke

The evidence lower bound (ELBO) is one of the most central objectives for probabilistic unsupervised learning. For the ELBOs of several generative models and model classes, we here prove convergence to entropy sums. As one result, we provide a list of generative models for which entropy convergence has been shown, so far, along with the corresponding expressions for entropy sums. Our considerations include very prominent generative models such as probabilistic PCA, sigmoid belief nets or Gaussian mixture models. However, we treat more models and entire model classes such as general mixtures of exponential family distributions. Our main contributions are the proofs for the individual models. For each given model we show that the conditions stated in Theorem 1 or Theorem 2 of [arXiv:2209.03077] are fulfilled such that by virtue of the theorems the given model's ELBO is equal to an entropy sum at all stationary points. The equality of the ELBO at stationary points applies under realistic conditions: for finite numbers of data points, for model/data mismatches, at any stationary point including saddle points etc, and it applies for any well behaved family of variational distributions.

Subjects: Machine Learning , Information Theory , Machine Learning , Probability , Statistics Theory

Publish: 2024-12-25 15:47:23 UTC


#2 Estimating shared subspace with AJIVE: the power and limitation of multiple data matrices [PDF] [Copy] [Kimi] [REL]

Authors: Yuepeng Yang, Cong Ma

Integrative data analysis often requires disentangling joint and individual variations across multiple datasets, a challenge commonly addressed by the Joint and Individual Variation Explained (JIVE) model. While numerous methods have been developed to estimate the shared subspace under JIVE, the theoretical understanding of their performance remains limited, particularly in the context of multiple matrices and varying levels of subspace misalignment. This paper bridges this gap by providing a systematic analysis of shared subspace estimation in multi-matrix settings. We focus on the Angle-based Joint and Individual Variation Explained (AJIVE) method, a two-stage spectral approach, and establish new performance guarantees that uncover its strengths and limitations. Specifically, we show that in high signal-to-noise ratio (SNR) regimes, AJIVE's estimation error decreases with the number of matrices, demonstrating the power of multi-matrix integration. Conversely, in low-SNR settings, AJIVE exhibits a non-diminishing error, highlighting fundamental limitations. To complement these results, we derive minimax lower bounds, showing that AJIVE achieves optimal rates in high-SNR regimes. Furthermore, we analyze an oracle-aided spectral estimator to demonstrate that the non-diminishing error in low-SNR scenarios is a fundamental barrier. Extensive numerical experiments corroborate our theoretical findings, providing insights into the interplay between SNR, matrix count, and subspace misalignment.

Subjects: Machine Learning , Machine Learning , Statistics Theory

Publish: 2025-01-16 07:23:26 UTC


#3 Lattice Rules Meet Kernel Cubature [PDF] [Copy] [Kimi] [REL]

Authors: Vesa Kaarnioja, Ilja Klebanov, Claudia Schillings, Yuya Suzuki

Rank-1 lattice rules are a class of equally weighted quasi-Monte Carlo methods that achieve essentially linear convergence rates for functions in a reproducing kernel Hilbert space (RKHS) characterized by square-integrable first-order mixed partial derivatives. In this work, we explore the impact of replacing the equal weights in lattice rules with optimized cubature weights derived using the reproducing kernel. We establish a theoretical result demonstrating a doubled convergence rate in the one-dimensional case and provide numerical investigations of convergence rates in higher dimensions. We also present numerical results for an uncertainty quantification problem involving an elliptic partial differential equation with a random coefficient.

Subjects: Numerical Analysis , Statistics Theory

Publish: 2025-01-16 12:20:42 UTC


#4 Statistical inference for interacting innovation processes and related general results [PDF] [Copy] [Kimi] [REL]

Authors: Giacomo Aletti, Irene Crimaldi, Andrea Ghiglietti

Given the importance of understanding how different innovation processes affect each other, we have introduced a model for a finite system of interacting innovation processes. The present work focuses on the second-order asymptotic properties of the model and illustrates how to leverage the theoretical results in order to make statistical inference on the intensity of the interaction. We apply the proposed tools to two real data sets (from Reddit and Gutenberg).

Subjects: Methodology , Statistics Theory

Publish: 2025-01-16 16:43:05 UTC


#5 A Near-optimal Algorithm for Learning Margin Halfspaces with Massart Noise [PDF] [Copy] [Kimi] [REL]

Authors: Ilias Diakonikolas, Nikos Zarifis

We study the problem of PAC learning $\gamma$-margin halfspaces in the presence of Massart noise. Without computational considerations, the sample complexity of this learning problem is known to be $\widetilde{\Theta}(1/(\gamma^2 \epsilon))$. Prior computationally efficient algorithms for the problem incur sample complexity $\tilde{O}(1/(\gamma^4 \epsilon^3))$ and achieve 0-1 error of $\eta+\epsilon$, where $\eta<1/2$ is the upper bound on the noise rate. Recent work gave evidence of an information-computation tradeoff, suggesting that a quadratic dependence on $1/\epsilon$ is required for computationally efficient algorithms. Our main result is a computationally efficient learner with sample complexity $\widetilde{\Theta}(1/(\gamma^2 \epsilon^2))$, nearly matching this lower bound. In addition, our algorithm is simple and practical, relying on online SGD on a carefully selected sequence of convex losses.

Subjects: Machine Learning , Data Structures and Algorithms , Statistics Theory , Machine Learning

Publish: 2025-01-16 17:44:18 UTC


#6 Semiparametrics via parametrics and contiguity [PDF] [Copy] [Kimi] [REL]

Authors: Adam Lee, Emil A. Stoltenberg, Per A. Mykland

Inference on the parametric part of a semiparametric model is no trivial task. On the other hand, if one approximates the infinite dimensional part of the semiparametric model by a parametric function, one obtains a parametric model that is in some sense close to the semiparametric model; and inference may proceed by the method of maximum likelihood. Under regularity conditions, and assuming that the approximating parametric model in fact generated the data, the ensuing maximum likelihood estimator is asymptotically normal and efficient (in the approximating parametric model). Thus one obtains a sequence of asymptotically normal and efficient estimators in a sequence of growing parametric models that approximate the semiparametric model and, intuitively, the limiting {`}semiparametric{'} estimator should be asymptotically normal and efficient as well. In this paper we make this intuition rigorous. Consequently, we are able to move much of the semiparametric analysis back into classical parametric terrain, and then translate our parametric results back to the semiparametric world by way of contiguity. Our approach departs from the sieve literature by being more specific about the approximating parametric models, by working under these when treating the parametric models, and by taking advantage of the mutual contiguity between the parametric and semiparametric models to lift conclusions about the former to conclusions about the latter. We illustrate our theory with two canonical examples of semiparametric models, namely the partially linear regression model and the Cox regression model. An upshot of our theory is a new, relatively simple, and rather parametric proof of the efficiency of the Cox partial likelihood estimator.

Subjects: Statistics Theory , Econometrics , Methodology

Publish: 2025-01-16 11:40:48 UTC