2025-07-14 | | Total: 4
This paper advances the general theory of continuous sparse regularisation on measures with the Beurling-LASSO (BLASSO). This TV-regularized convex program on the space of measures allows to recover a sparse measure using a noisy observation from an appropriate measurement operator. While previous works have uncovered the central role played by this operator and its associated kernel in order to get estimation error bounds, the latter requires a technical local positive curvature (LPC) assumption to be verified on a case-by-case basis. In practice, this yields only few LPC-kernels for which this condition is proved. At the heart of our contribution lies the kernel switch, which uncouples the model kernel from the LPC assumption: it enables to leverage any known LPC-kernel as a pivot kernel to prove error bounds, provided embedding conditions are verified between the model and pivot RKHS. We increment the list of LPC-kernels, proving that the "sinc-4" kernel, used for signal recovery and mixture problems, does satisfy the LPC assumption. Furthermore, we also show that the BLASSO localisation error around the true support decreases with the noise level, leading to effective near regions. This improves on known results where this error is fixed with some parameters depending on the model kernel. We illustrate the interest of our results in the case of translation-invariant mixture model estimation, using bandlimiting smoothing and sketching techniques to reduce the computational burden of BLASSO.
The theory of testing statistical functionals is developed for non-parametric two-sample problems. For differentiable real-valued statistical functionals, some tests for the one-sided and two-sided cases are proposed and studied. The asymptotic power function is computed along implicit alternatives and hypotheses and compared with upper bounds for the power functions of tests for limit experiments. It is shown that the proposed tests are locally asymptotically most powerful under weak assumptions. For some important test problems, the distribution-free asymptotic optimal tests are developed. Among other things, the products and sums of von Mises functionals and the Wilcoxon functional are treated. We pay special attention to the computability of the tests.
Firstly, assuming Gaussianity, equations for the following information theory measures are presented: total correlation/coherence (TC), dual total correlation/coherence (DTC), O-information, TSE complexity, and the redundancy-synergy index (RSI). Since these measures are functions of the covariance matrix "S" and its inverse "S^-1", the associated Wishart and inverse-Wishart distributions are of note. The DTC is shown here to be the Kullback-Leibler (KL) divergence for the inverse-Wishart pair "(S^-1)" and its diagonal matrix "diag(S^-1)", shedding light on its interpretation as a measure of "total partial correlation", -lndetP, with test hypothesis H0: P=I, where "P" is the standardized inverse covariance (i.e. P=(D^-1/2)(S^-1)(D^-1/2), with D=diag(S^-1)). The second aim of this paper introduces a generalization of all these measures for structured groups of variables. For instance, consider three or more groups, each consisting of three or more variables, with predominant redundancy within each group, but with synergistic interactions between groups. O-information will miss the between group synergy (since redundancy occurs more often in the system). In contrast, the structured O-information measure presented here will correctly report predominant synergy between groups. This is a relevant generalization towards structured multivariate information measures. A third aim is the presentation of a framework for quantifying the contribution of "connections" between variables, to the system's TC, DTC, O-information, and TSE complexity. A fourth aim is to present a generalization of the redundancy-synergy index for quantifying the contribution of a group of variables to the system's redundancy-synergy balance. Finally, it is shown that the expressions derived here directly apply to data from several other elliptical distributions. All program codes, data files, and executables are available.
\textit{Mallows model} is a widely-used probabilistic framework for learning from ranking data, with applications ranging from recommendation systems and voting to aligning language models with human preferences~\cite{chen2024mallows, kleinberg2021algorithmic, rafailov2024direct}. Under this model, observed rankings are noisy perturbations of a central ranking $\sigma$, with likelihood decaying exponentially in distance from $\sigma$, i.e, $P (\pi) \propto \exp\big(-\beta \cdot d(\pi, \sigma)\big),$ where $\beta > 0$ controls dispersion and $d$ is a distance function. Existing methods mainly focus on fixed distances (such as Kendall's $\tau$ distance), with no principled approach to learning the distance metric directly from data. In practice, however, rankings naturally vary by context; for instance, in some sports we regularly see long-range swaps (a low-rank team beating a high-rank one), while in others such events are rare. Motivated by this, we propose a generalization of Mallows model that learns the distance metric directly from data. Specifically, we focus on $L_\alpha$ distances: $d_\alpha(\pi,\sigma):=\sum_{i=1} |\pi(i)-\sigma(i)|^\alpha$. For any $\alpha\geq 1$ and $\beta>0$, we develop a Fully Polynomial-Time Approximation Scheme (FPTAS) to efficiently generate samples that are $\epsilon$- close (in total variation distance) to the true distribution. Even in the special cases of $L_1$ and $L_2$, this generalizes prior results that required vanishing dispersion ($\beta\to0$). Using this sampling algorithm, we propose an efficient Maximum Likelihood Estimation (MLE) algorithm that jointly estimates the central ranking, the dispersion parameter, and the optimal distance metric. We prove strong consistency results for our estimators (for any values of $\alpha$ and $\beta$), and we validate our approach empirically using datasets from sports rankings.