2025-03-28 | | Total: 4
Decision making under uncertainty is a cross-cutting challenge in science and engineering. Most approaches to this challenge employ probabilistic representations of uncertainty. In complicated systems accessible only via data or black-box models, however, these representations are rarely known. We discuss how to characterize and manipulate such representations using triangular transport maps, which approximate any complex probability distribution as a transformation of a simple, well-understood distribution. The particular structure of triangular transport guarantees many desirable mathematical and computational properties that translate well into solving practical problems. Triangular maps are actively used for density estimation, (conditional) generative modelling, Bayesian inference, data assimilation, optimal experimental design, and related tasks. While there is ample literature on the development and theory of triangular transport methods, this manuscript provides a detailed introduction for scientists interested in employing measure transport without assuming a formal mathematical background. We build intuition for the key foundations of triangular transport, discuss many aspects of its practical implementation, and outline the frontiers of this field.
We introduce a novel approach to compositional data analysis based on $L^{\infty}$-normalization, addressing challenges posed by zero-rich high-throughput data. Traditional methods like Aitchison's transformations require excluding zeros, conflicting with the reality that omics datasets contain structural zeros that cannot be removed without violating inherent biological structures. Such datasets exist exclusively on the boundary of compositional space, making interior-focused approaches fundamentally misaligned. We present a family of $L^p$-normalizations, focusing on $L^{\infty}$-normalization due to its advantageous properties. This approach identifies compositional space with the $L^{\infty}$-simplex, represented as a union of top-dimensional faces called $L^{\infty}$-cells. Each cell consists of samples where one component's absolute abundance equals or exceeds all others, with a coordinate system identifying it with a d-dimensional unit cube. When applied to vaginal microbiome data, $L^{\infty}$-decomposition aligns with established Community State Types while offering advantages: each $L^{\infty}$-CST is named after its dominating component, has clear biological meaning, remains stable under sample changes, resolves cluster-based issues, and provides a coordinate system for exploring internal structure. We extend homogeneous coordinates through cube embedding, mapping data into a d-dimensional unit cube. These embeddings can be integrated via Cartesian product, providing unified representations from multiple perspectives. While demonstrated through microbiome studies, these methods apply to any compositional data.
There is increasing interest in flexible parametric models for the analysis of time-to-event data, yet Bayesian approaches that offer incorporation of prior knowledge remain underused. A flexible Bayesian parametric model has recently been proposed that uses M-splines to model the hazard function. We conducted a simulation study to assess the statistical performance of this model, which is implemented in the survextrap R package. Our simulation uses data generating mechanisms of realistic survival data based on two oncology clinical trials. Statistical performance is compared across a range of flexible models, varying the M-spline specification, smoothing procedure, priors, and other computational settings. We demonstrate good performance across realistic scenarios, including good fit of complex baseline hazard functions and time-dependent covariate effects. This work helps inform key considerations to guide model selection, as well as identifying appropriate default model settings in the software that should perform well in a broad range of applications.
Many Monte Carlo (MC) and importance sampling (IS) methods use mixture models (MMs) for their simplicity and ability to capture multimodal distributions. Recently, subtractive mixture models (SMMs), i.e. MMs with negative coefficients, have shown greater expressiveness and success in generative modeling. However, their negative parameters complicate sampling, requiring costly auto-regressive techniques or accept-reject algorithms that do not scale in high dimensions. In this work, we use the difference representation of SMMs to construct an unbiased IS estimator ($\Delta\text{Ex}$) that removes the need to sample from the SMM, enabling high-dimensional expectation estimation with SMMs. In our experiments, we show that $\Delta\text{Ex}$ can achieve comparable estimation quality to auto-regressive sampling while being considerably faster in MC estimation. Moreover, we conduct initial experiments with $\Delta\text{Ex}$ using hand-crafted proposals, gaining first insights into how to construct safe proposals for $\Delta\text{Ex}$.