2024-10-29 | | Total: 12
The topology of the tree underlying a tree-structured Markov random field (MRF) is central to the understanding of its stochastic dynamics: it is, after all, what synthesizes the rich dependence relations within the MRF. In this paper, we shed light on the influence of the tree's topology, through an extensive comparison-based analysis, on the aggregate distribution of the MRF. This is done within the framework of a recently introduced family of tree-structured MRFs with the uncommon property of having fixed Poisson marginal distributions unaffected by the dependence scheme. We establish convex orderings of sums of MRFs encrypted on trees having different topologies, leading to the devising of a new poset of trees. Hasse diagrams, cataloguing trees of dimension up to 9, and methods for the comparison of higher-dimension trees are provided to offer an exhaustive investigation of the new poset. We also briefly discuss its relation to other existing posets of trees and to invariants from spectral graph theory. Such an analysis requires, beforehand, to study the joint distribution of a MRF's component and its sum, a random vector we refer to as a synecdochic pair. To assess if a component is less or more contributing than another to the sum, we employ stochastic orders to compare synecdochic pairs within a MRF. The resulting orderings are reflected through allocation-related quantities, which thus act as centrality indices.
In this paper, we consider the problem of estimating the interaction parameter $p$ of a $p$-spin Curie-Weiss model at inverse temperature $\beta$, given a single observation from this model. We show, by a contiguity argument, that joint estimation of the parameters $\beta$ and $p$ is impossible, which implies that estimation of $p$ is impossible if $\beta$ is unknown. These impossibility results are also extended to the more general $p$-spin Erdős-Rényi Ising model. The situation is more delicate when $\beta$ is known. In this case, we show that there exists an increasing threshold function $\beta^*(p)$, such that for all $\beta$, consistent estimation of $p$ is impossible when $\beta^*(p) > \beta$, and for almost all $\beta$, consistent estimation of $p$ is possible for $\beta^*(p)<\beta$.
Conditional independence and graphical models are crucial concepts for sparsity and statistical modeling in higher dimensions. For Lévy processes, a widely applied class of stochastic processes, these notions have not been studied. By the Lévy-Itô decomposition, a multivariate Lévy process can be decomposed into the sum of a Brownian motion part and an independent jump process. We show that conditional independence statements between the marginal processes can be studied separately for these two parts. While the Brownian part is well-understood, we derive a novel characterization of conditional independence between the sample paths of the jump process in terms of the Lévy measure. We define Lévy graphical models as Lévy processes that satisfy undirected or directed Markov properties. We prove that the graph structure is invariant under changes of the univariate marginal processes. Lévy graphical models allow the construction of flexible, sparse dependence models for Lévy processes in large dimensions, which are interpretable thanks to the underlying graph. For trees, we develop statistical methodology to learn the underlying structure from low- or high-frequency observations of the Lévy process and show consistent graph recovery. We apply our method to model stock returns from U.S. companies to illustrate the advantages of our approach.
The aim of this article is to write the $p$-Wasserstein metric $W_p$ with the $p$-norm, $p\in [1,\infty)$, on $\R^d$ in terms of copula. In particular for the case of one-dimensional distributions, we get that the copula employed to get the optimal coupling of the Wasserstein distances is the comotonicity copula. We obtain the equivalent result also for $d$-dimensional distributions under the sufficient and necessary condition that these have the same dependence structure of their one-dimensional marginals, i.e that the $d$-dimensional distributions share the same copula. Assuming $p\neq q$, $p,q$ $\in [1,\infty)$ and that the probability measures $\mu$ and $\nu$ are sharing the same copula, we also analyze the Wasserstein distance $W_{p,q}$ discussed in \cite{Alfonsi} and get an upper and lower bounds of $W_{p,q}$ in terms of $W_p$, written in terms of comonotonicity copula. We show that as a consequence the lower and upper bound of $W_{p,q}$ can be written in terms of generalized inverse functions.
We propose a general transfer learning framework for clustering given a main dataset and an auxiliary one about the same subjects. The two datasets may reflect similar but different latent grouping structures of the subjects. We propose an adaptive transfer clustering (ATC) algorithm that automatically leverages the commonality in the presence of unknown discrepancy, by optimizing an estimated bias-variance decomposition. It applies to a broad class of statistical models including Gaussian mixture models, stochastic block models, and latent class models. A theoretical analysis proves the optimality of ATC under the Gaussian mixture model and explicitly quantifies the benefit of transfer. Extensive simulations and real data experiments confirm our method's effectiveness in various scenarios.
We prove that there is a universal constant $C>0$ so that for every $d \in \mathbb N$, every centered subgaussian distribution $\mathcal D$ on $\mathbb R^d$, and every even $p \in \mathbb N$, the $d$-variate polynomial $(Cp)^{p/2} \cdot \|v\|_{2}^p - \mathbb E_{X \sim \mathcal D} \langle v,X\rangle^p$ is a sum of square polynomials. This establishes that every subgaussian distribution is \emph{SoS-certifiably subgaussian} -- a condition that yields efficient learning algorithms for a wide variety of high-dimensional statistical tasks. As a direct corollary, we obtain computationally efficient algorithms with near-optimal guarantees for the following tasks, when given samples from an arbitrary subgaussian distribution: robust mean estimation, list-decodable mean estimation, clustering mean-separated mixture models, robust covariance-aware mean estimation, robust covariance estimation, and robust linear regression. Our proof makes essential use of Talagrand's generic chaining/majorizing measures theorem.
Bayesian inference for doubly intractable distributions is challenging because they include intractable terms, which are functions of parameters of interest. Although several alternatives have been developed for such models, they are computationally intensive due to repeated auxiliary variable simulations. We propose a novel Monte Carlo Stein variational gradient descent (MC-SVGD) approach for inference for doubly intractable distributions. Through an efficient gradient approximation, our MC-SVGD approach rapidly transforms an arbitrary reference distribution to approximate the posterior distribution of interest, without necessitating any predefined variational distribution class for the posterior. Such a transport map is obtained by minimizing Kullback-Leibler divergence between the transformed and posterior distributions in a reproducing kernel Hilbert space (RKHS). We also investigate the convergence rate of the proposed method. We illustrate the application of the method to challenging examples, including a Potts model, an exponential random graph model, and a Conway--Maxwell--Poisson regression model. The proposed method achieves substantial computational gains over existing algorithms, while providing comparable inferential performance for the posterior distributions.
We introduce the almost goodness-of-fit test, a procedure to decide if a (parametric) model provides a good representation of the probability distribution generating the observed sample. We consider the approximate model determined by an M-estimator of the parameters as the best representative of the unknown distribution within the parametric class. The objective is the approximate validation of a distribution or an entire parametric family up to a pre-specified threshold value, the margin of error. The methodology also allows quantifying the percentage improvement of the proposed model compared to a non-informative (constant) one. The test statistic is the $\mathrm{L}^p$-distance between the empirical distribution function and the corresponding one of the estimated (parametric) model. The value of the parameter $p$ allows modulating the impact of the tails of the distribution in the validation of the model. By deriving the asymptotic distribution of the test statistic, as well as proving the consistency of its bootstrap approximation, we present an easy-to-implement and flexible method. The performance of the proposal is illustrated with a simulation study and the analysis of a real dataset.
Large-scale datasets with count outcome variables are widely present in various applications, and the Poisson regression model is among the most popular models for handling count outcomes. This paper considers the high-dimensional sparse Poisson regression model and proposes bias-corrected estimators for both linear and quadratic transformations of high-dimensional regression vectors. We establish the asymptotic normality of the estimators, construct asymptotically valid confidence intervals, and conduct related hypothesis testing. We apply the devised methodology to high-dimensional mediation analysis with count outcome, with particular application of testing for the existence of interaction between the treatment variable and high-dimensional mediators. We demonstrate the proposed methods through extensive simulation studies and application to real-world epigenetic data.
Federated Learning (FL) has emerged as a groundbreaking paradigm in collaborative machine learning, emphasizing decentralized model training to address data privacy concerns. While significant progress has been made in optimizing federated learning, the exploration of generalization error, particularly in heterogeneous settings, has been limited, focusing mainly on parametric cases. This paper investigates the generalization properties of deep federated regression within a two-stage sampling model. Our findings highlight that the intrinsic dimension, defined by the entropic dimension, is crucial for determining convergence rates when appropriate network sizes are used. Specifically, if the true relationship between response and explanatory variables is charecterized by a $\beta$-Hölder function and there are $n$ independent and identically distributed (i.i.d.) samples from $m$ participating clients, the error rate for participating clients scales at most as $\tilde{O}\left((mn)^{-2\beta/(2\beta + \bar{d}_{2\beta}(\lambda))}\right)$, and for non-participating clients, it scales as $\tilde{O}\left(\Delta \cdot m^{-2\beta/(2\beta + \bar{d}_{2\beta}(\lambda))} + (mn)^{-2\beta/(2\beta + \bar{d}_{2\beta}(\lambda))}\right)$. Here, $\bar{d}_{2\beta}(\lambda)$ represents the $2\beta$-entropic dimension of $\lambda$, the marginal distribution of the explanatory variables, and $\Delta$ characterizes the dependence between the sampling stages. Our results explicitly account for the "closeness" of clients, demonstrating that the convergence rates of deep federated learners depend on intrinsic rather than nominal high-dimensionality.
Causal structure learning with data from multiple contexts carries both opportunities and challenges. Opportunities arise from considering shared and context-specific causal graphs enabling to generalize and transfer causal knowledge across contexts. However, a challenge that is currently understudied in the literature is the impact of differing observational support between contexts on the identifiability of causal graphs. Here we study in detail recently introduced [6] causal graph objects that capture both causal mechanisms and data support, allowing for the analysis of a larger class of context-specific changes, characterizing distribution shifts more precisely. We thereby extend results on the identifiability of context-specific causal structures and propose a framework to model context-specific independence (CSI) within structural causal models (SCMs) in a refined way that allows to explore scenarios where these graph objects differ. We demonstrate how this framework can help explaining phenomena like anomalies or extreme events, where causal mechanisms change or appear to change under different conditions. Our results contribute to the theoretical foundations for understanding causal relations in multi-context systems, with implications for generalization, transfer learning, and anomaly detection. Future work may extend this approach to more complex data types, such as time-series.
Graph Convolutional Networks (GCNs) have become a pivotal method in machine learning for modeling functions over graphs. Despite their widespread success across various applications, their statistical properties (e.g. consistency, convergence rates) remain ill-characterized. To begin addressing this knowledge gap, in this paper, we provide a formal analysis of the impact of convolution operators on regression tasks over homophilic networks. Focusing on estimators based solely on neighborhood aggregation, we examine how two common convolutions - the original GCN and GraphSage convolutions - affect the learning error as a function of the neighborhood topology and the number of convolutional layers. We explicitly characterize the bias-variance trade-off incurred by GCNs as a function of the neighborhood size and identify specific graph topologies where convolution operators are less effective. Our theoretical findings are corroborated by synthetic experiments, and provide a start to a deeper quantitative understanding of convolutional effects in GCNs for offering rigorous guidelines for practitioners.