2025-04-25 | | Total: 8
The topic of multiple hypotheses testing now has a potpourri of novel theories and ubiquitous applications in diverse scientific fields. However, the universal utility of this field often hinders the possibility of having a generalized theory that accommodates every scenario. This tradeoff is better reflected through the lens of dependence, a central piece behind the theoretical and applied developments of multiple testing. Although omnipresent in many scientific avenues, the nature and extent of dependence vary substantially with the context and complexity of the particular scenario. Positive dependence is the norm in testing many treatments versus a single control or in spatial statistics. On the contrary, negative dependence arises naturally in tests based on split samples and in cyclical, ordered comparisons. In GWAS, the SNP markers are generally considered to be weakly dependent. Generalized familywise error rate (k-FWER) control has been one of the prominent frequentist approaches in simultaneous inference. However, the performances of k-FWER controlling procedures are yet unexplored under different dependencies. This paper revisits the classical testing problem of normal means in different correlated frameworks. We establish upper bounds on the generalized familywise error rates under each dependence, consequently giving rise to improved testing procedures. Towards this, we present improved probability inequalities, which are of independent theoretical interest
This article exists first and foremost to contribute to a tribute to Patrick Cattiaux. One of the two authors has known Patrick Cattiaux for a very long time, and owes him a great deal. If we are to illustrate the adage that life is made up of chance, then what could be better than the meeting of two young people in the 80s, both of whom fell in love with the mathematics of randomness, and one of whom changed the other's life by letting him in on a secret: if you really believe in it, you can turn this passion into a profession. By another happy coincidence, this tribute comes at just the right time, as Michel Talagrand has been awarded the Abel prize. The temptation was therefore great to do a double. Following one of the many galleries opened up by mathematics, we shall first draw a link between the mathematics of Patrick Cattiaux and that of Michel Talagrand. Then we shall show how the abstract probabilistic material on the concentration of product measures thus revisited can be used to shed light on cut-off phenomena in our field of expertise, mathematical statistics. Nothing revolutionary here, as everyone knows the impact that Talagrand's work has had on the development of mathematical statistics since the late 90s, but we've chosen a very simple framework in which everything can be explained with minimal technicality, leaving the main ideas to the fore.
The null hypothesis of equality of distributions of functional data coming from $K$ samples is considered. The proposed test statistic is multivariate and its components are based on pairwise Cramér von Mises comparisons of empirical characteristic functionals. The significance of the test statistic is evaluated via the novel multivariate permutation test, where the final single $p$-value is computed using the discrete optimal measure transport. The methodology is illustrated by real data on cumulative intraday returns of Bitcoin.
The celebrated theorem of Chung, Graham, and Wilson on quasirandom graphs implies that if the 4-cycle and edge counts in a graph $G$ are both close to their typical number in $\mathbb{G}(n,1/2),$ then this also holds for the counts of subgraphs isomorphic to $H$ for any $H$ of constant size. We aim to prove a similar statement where the notion of close is whether the given (signed) subgraph count can be used as a test between $\mathbb{G}(n,1/2)$ and a stochastic block model $\mathbb{SBM}.$ Quantitatively, this is related to approximately maximizing $H \longrightarrow |\Phi(H)|^{\frac{1}{|\mathsf{V}(H)|}},$ where $\Phi(H)$ is the Fourier coefficient of $\mathbb{SBM}$, indexed by subgraph $H.$ This formulation turns out to be equivalent to approximately maximizing the partition function of a spin model over alphabet equal to the community labels in $\mathbb{SBM}.$ We resolve the approximate maximization when $\mathbb{SBM}$ satisfies one of four conditions: 1) the probability of an edge between any two vertices in different communities is exactly $1/2$; 2) the probability of an edge between two vertices from any two communities is at least $1/2$ (this case is also covered in a recent work of Yu, Zadik, and Zhang); 3) the probability of belonging to any given community is at least $c$ for some universal constant $c>0$; 4) $\mathbb{SBM}$ has two communities. In each of these cases, we show that there is an approximate maximizer of $|\Phi(H)|^{\frac{1}{|\mathsf{V}(H)|}}$ in the set $\mathsf{A} = \{\text{stars, 4-cycle}\}.$ This implies that if there exists a constant-degree polynomial test distinguishing $\mathbb{G}(n,1/2)$ and $\mathbb{SBM},$ then the two distributions can also be distinguished via the signed count of some graph in $\mathsf{A}.$ We conjecture that the same holds true for distinguishing $\mathbb{G}(n,1/2)$ and any graphon if we also add triangles to $\mathsf{A}.$
We consider the problem of recovering latent information from graphs under $\varepsilon$-edge local differential privacy where the presence of relationships/edges between two users/vertices remains confidential, even from the data curator. For the class of generalized random dot-product graphs, we show that a standard local differential privacy mechanism induces a specific geometric distortion in the latent positions. Leveraging this insight, we show that consistent recovery of the latent positions is achievable by appropriately adjusting the statistical inference procedure for the privatized graph. Furthermore, we prove that our procedure is nearly minimax-optimal under local edge differential privacy constraints. Lastly, we show that this framework allows for consistent recovery of geometric and topological information underlying the latent positions, as encoded in their persistence diagrams. Our results extend previous work from the private community detection literature to a substantially richer class of models and inferential tasks.
We study the continuous-time version of the empirical correlation coefficient between the paths of two possibly correlated Ornstein-Uhlenbeck processes, known as Yule's nonsense correlation for these paths. Using sharp tools from the analysis on Wiener chaos, we establish the asymptotic normality of the fluctuations of this correlation coefficient around its long-time limit, which is the mathematical correlation coefficient between the two processes. This asymptotic normality is quantified in Kolmogorov distance, which allows us to establish speeds of convergence in the Type-II error for two simple tests of independence of the paths, based on the empirical correlation, and based on its numerator. An application to independence of two observations of solutions to the stochastic heat equation is given, with excellent asymptotic power properties using merely a small number of the solutions' Fourier modes.
In many practical situations, randomly assigning treatments to subjects is uncommon due to feasibility constraints. For example, economic aid programs and merit-based scholarships are often restricted to those meeting specific income or exam score thresholds. In these scenarios, traditional approaches to estimating treatment effects typically focus solely on observations near the cutoff point, thereby excluding a significant portion of the sample and potentially leading to information loss. Moreover, these methods generally achieve a non-parametric convergence rate. While some approaches, e.g., Mukherjee et al. (2021), attempt to tackle these issues, they commonly assume that treatment effects are constant across individuals, an assumption that is often unrealistic in practice. In this study, we propose a differencing and matching-based estimator of the average treatment effect on the treated (ATT) in the presence of heterogeneous treatment effects, utilizing all available observations. We establish the asymptotic normality of our estimator and illustrate its effectiveness through various synthetic and real data analyses. Additionally, we demonstrate that our method yields non-parametric estimates of the conditional average treatment effect (CATE) and individual treatment effect (ITE) as a byproduct.
In this study, we discuss the relationship between two families of density-power-based divergences with functional degrees of freedom -- the Hölder divergence and the functional density power divergence (FDPD) -- based on their intersection and generalization. These divergence families include the density power divergence and the $\gamma$-divergence as special cases. First, we prove that the intersection of the Hölder divergence and the FDPD is limited to a general divergence family introduced by Jones et al. (Biometrika, 2001). Subsequently, motivated by the fact that Hölder's inequality is used in the proofs of nonnegativity for both the Hölder divergence and the FDPD, we define a generalized divergence family, referred to as the $\xi$-Hölder divergence. The nonnegativity of the $\xi$-Hölder divergence is established through a combination of the inequalities used to prove the nonnegativity of the Hölder divergence and the FDPD. Furthermore, we derive an inequality between the composite scoring rules corresponding to different FDPDs based on the $\xi$-Hölder divergence. Finally, we prove that imposing the mathematical structure of the Hölder score on a composite scoring rule results in the $\xi$-Hölder divergence.