Processing math: 3%

UAI.2025 - Poster

| Total: 209

#1 FeDCM: Federated Learning of Deep Causal Generative Models [PDF5] [Copy] [Kimi2] [REL]

Authors: Md Musfiqur Rahman, Murat Kocaoglu

In many real-world settings, such as medicine and finance causal effect is a valuable metric for decision making. For many predictive tasks, causal mechanisms provide robust estimators while existing ML-driven predictors might be vulnerable to spurious correlations. In such settings, when data is decentralized and privacy must be preserved, federated learning plays an important role. However, causal inference in a federated learning setup is a largely unexplored research area. In this paper, we learn a proxy of the underlying structural causal model (SCM) with deep generative models from decentralized observational data sources possibly containing high-dimensional variables. Based on client preference or high dimensionality of variables, we modularize the SCM mechanisms and find the minimal subset appropriate for federated learning while having rest of the mechanisms trained on individual client’s local data. When all connected together, the proxy SCM, named as the federated deep causal generative model (FeDCM ), offers estimation of any identifiable causal effect. We perform extensive experiments to illustrate the utility and performance of our approach.

Subject: UAI.2025 - Poster


#2 Residual Reweighted Conformal Prediction for Graph Neural Networks [PDF1] [Copy] [Kimi2] [REL]

Authors: Zheng Zhang, Jie Bao, Zhixin Zhou, nicolo colombo, Lixin Cheng, Rui Luo

Graph Neural Networks (GNNs) excel at modeling relational data but face significant challenges in high-stakes domains due to unquantified uncertainty. Conformal prediction (CP) offers statistical coverage guarantees, but existing methods often produce overly conservative prediction intervals that fail to account for graph heteroscedasticity and structural biases. While residual reweighting CP variants address some of these limitations, they neglect graph topology, cluster-specific uncertainties, and risk data leakage by reusing training sets. To address these issues, we propose Residual Reweighted GNN (RR-GNN), a framework designed to generate minimal prediction sets with provable marginal coverage guarantees. RR-GNN introduces three major innovations to enhance prediction performance. First, it employs Graph-Structured Mondrian CP to partition nodes or edges into communities based on topological features, ensuring cluster-conditional coverage that reflects heterogeneity. Second, it uses Residual-Adaptive Nonconformity Scores by training a secondary GNN on a held-out calibration set to estimate task-specific residuals, dynamically adjusting prediction intervals according to node or edge uncertainty. Third, it adopts a Cross-Training Protocol, which alternates the optimization of the primary GNN and the residual predictor to prevent information leakage while maintaining graph dependencies. We validate RR-GNN on 15 real-world graphs across diverse tasks, including node classification, regression, and edge weight prediction. Compared to CP baselines, RR-GNN achieves improved efficiency over state-of-the-art methods, with no loss of coverage.

Subject: UAI.2025 - Poster


#3 Conformal Prediction without Nonconformity Scores [PDF] [Copy] [Kimi] [REL]

Authors: Jonas Hanselle, Alireza Javanmardi, Tobias Florin Oberkofler, Yusuf Sale, Eyke Hüllermeier

Conformal prediction (CP) is an uncertainty quantification framework that allows for constructing statistically valid prediction sets. Key to the construction of these sets is the notion of a nonconformity function, which assigns a real-valued score to individual data points: only those (hypothetical) data points contribute to a prediction set that sufficiently conform to the data. The point of departure of this work is the observation that CP predictions are invariant against (strictly) monotone transformations of the nonconformity function. In other words, it is only the ordering of the scores that matters, not their quantitative values. Consequently, instead of scoring individual data points, a conformal predictor only needs to be able to compare pairs of data points, deciding which of them is the more conforming one. This suggests an interesting connection between CP and preference learning, in particular learning-to-rank methods, and makes CP amenable to training data in the form of (qualitative) preferences. Elaborating on this connection, we propose methods for preference-based CP and show their usefulness in real-world classification tasks.

Subject: UAI.2025 - Poster


#4 LoSAM: Local Search in Additive Noise Models with Mixed Mechanisms and General Noise for Global Causal Discovery [PDF] [Copy] [Kimi] [REL]

Authors: Sujai Hiremath, PROMIT GHOSAL, Kyra Gan

Inferring causal relationships from observational data is crucial when experiments are costly or infeasible. Additive noise models (ANMs) enable unique directed acyclic graph (DAG) identification, but existing sample-efficient ANM methods often rely on restrictive assumptions on the data generating process, limiting their applicability to real-world settings. We propose local search in additive noise models, LoSAM, a topological ordering method for learning a unique DAG in ANMs with mixed causal mechanisms and general noise distributions. We introduce new causal substructures and criteria for identifying roots and leaves, enabling efficient top-down learning. We prove asymptotic consistency and polynomial runtime, ensuring scalability and sample efficiency. We test LoSAM on synthetic and real-world data, demonstrating state-of-the-art performance across all mixed mechanism settings.

Subject: UAI.2025 - Poster


#5 Flow-Based Delayed Hawkes Process [PDF] [Copy] [Kimi1] [REL]

Authors: Chao Yang, Wendi Ren, Shuang Li

Multivariate Hawkes processes are classic temporal point process models for event data. These models are simple and parametric in nature, offering interpretability by capturing the triggering effects between event types. However, these parametric models often struggle with low model capacity, limiting their expressive power to capture heterogeneous data patterns influenced by latent variables. In this paper, we propose a simple yet powerful extension: the Flow-based Delayed Hawkes Process, which integrates Normalizing Flows as a generative model to parameterize the Hawkes process. By generating all model parameters through the flow-based network, our approach significantly improves flexibility and expressiveness while preserving interpretability. We provide theoretical guarantees by proving the identifiability of the model parameters and the consistency of the maximum likelihood estimator under mild assumptions. Extensive experiments on both synthetic and real-world datasets show that our model outperforms existing baselines in capturing intricate and heterogeneous event dynamics.

Subject: UAI.2025 - Poster


#6 Provably Adaptive Average Reward Reinforcement Learning for Metric Spaces [PDF] [Copy] [Kimi] [REL]

Authors: Avik Kar, Rahul Singh

We study infinite-horizon average-reward reinforcement learning (RL) for Lipschitz MDPs, a broad class that subsumes several important classes such as linear and RKHS MDPs, function approximation frameworks, and develop an adaptive algorithm ZoRL with regret bounded as \mathcal{O}\big(T^{1 - d_{\text{eff.}}^{-1}}\big), where d_{\text{eff.}}= 2d_\mathcal{S} + d_z + 3, d_\mathcal{S} is the dimension of the state space and d_z is the zooming dimension. In contrast, algorithms with fixed discretization yield d_{\text{eff.}} = 2(d_\mathcal{S} + d_\mathcal{A}) + 2, d_\mathcal{A} being the dimension of action space. \text{ZoRL} achieves this by discretizing the state-action space adaptively and zooming into ''promising regions'' of the state-action space. d_z, a problem-dependent quantity bounded by the state-action space's dimension, allows us to conclude that if an MDP is benign, then the regret of \text{ZoRL} will be small. The zooming dimension and \text{ZoRL} are truly adaptive, i.e., the current work shows how to capture adaptivity gains for infinite-horizon average-reward RL. \text{ZoRL} outperforms other state-of-the-art algorithms in experiments, thereby demonstrating the gains arising due to adaptivity.

Subject: UAI.2025 - Poster


#7 Learning Algorithms for Multiple Instance Regression [PDF1] [Copy] [Kimi] [REL]

Authors: Aaryan Gupta, Rishi Saket

Multiple instance regression, introduced by Ray and Page [2001], is a generalisation of supervised regression in which the training data is available as a bag of feature-vectors (instances) and for each bag there is a bag-label which matches the label of one (unknown) primary instance from that bag. The goal is to compute a hypothesis regressor consistent with the underlying instance-labels. While most works on MIR focused on training models on such training data, computational learnability of MIR was only recently explored by Chauhan et al. [UAI 2024] who showed worst case intractability of properly learning *linear regressors* in MIR by showing a inapproximability bound. However, their work did not rule out efficient algorithms for this problem on natural distributions and randomly chosen labels. In this work we show that it is indeed possible to efficiently learn linear regressors in MIR when given access to random bags of uniformly randomly sampled primary instance chosen as the bag-label in which the feature vectors are independently sampled from Gaussian distributions. This is achieved by optimizing a certain bag-level loss which, via concentration bounds, yields a close approximation to the target linear regressor. Lastly, we show that the bag-level loss is also useful for learning general concepts (e.g. neural networks) in this setting: an optimizer of the loss on sampled bags is, w.h.p. a close approximation of a scaled version of the target function. We include experimental evaluations of our learning algorithms on synthetic and real-world datasets showing that our method outperforms the baseline MIR methods.

Subject: UAI.2025 - Poster


#8 Efficient Algorithms for Logistic Contextual Slate Bandits with Bandit Feedback [PDF] [Copy] [Kimi] [REL]

Authors: Tanmay Goyal, Gaurav Sinha

We study the Logistic Contextual Slate Bandit problem, where, at each round, an agent selects a slate of N items from an exponentially large set (of size 2^{\Omega(N)}) of candidate slates provided by the environment. A single binary reward, determined by a logistic model, is observed for the chosen slate. Our objective is to develop algorithms that maximize cumulative reward over T rounds while maintaining low per-round computational costs. We propose two algorithms, Slate-GLM-OFU and Slate-GLM-TS, that accomplish this goal. These algorithms achieve N^{O(1)} per-round time complexity via local planning (independent slot selections), and low regret through global learning (joint parameter estimation). We provide theoretical and empirical evidence supporting these claims. Under a well-studied diversity assumption, we prove that Slate-GLM-OFU incurs only \tilde{O}(\sqrt{T}) regret. Extensive experiments across a wide range of synthetic settings demonstrate that our algorithms consistently outperform state-of-the-art baselines, achieving both the lowest regret and the fastest runtime. Furthermore, we apply our algorithm to select in-context examples in prompts of Language Models for solving binary classification tasks such as sentiment analysis. Our approach achieves competitive test accuracy, making it a viable alternative in practical scenarios.

Subject: UAI.2025 - Poster


#9 Beyond Sin-Squared Error: Linear Time Entrywise Uncertainty Quantification for Streaming PCA [PDF] [Copy] [Kimi] [REL]

Authors: Syamantak Kumar, Shourya Pandey, Purnamrita Sarkar

We propose a novel statistical inference framework for streaming principal component analysis (PCA) using Oja’s algorithm, enabling the construction of confidence intervals for individual entries of the estimated eigenvector. Most existing works on streaming PCA focus on providing sharp sin-squared error guarantees. Recently, there has been some interest in uncertainty quantification for the sin-squared error. However, uncertainty quantification or sharp error guarantees for _entries of the estimated eigenvector_ in the streaming setting remains largely unexplored. We derive a sharp Bernstein-type concentration bound for elements of the estimated vector matching the optimal error rate up to logarithmic factors. We also establish a Central Limit Theorem for a suitably centered and scaled subset of the entries. To efficiently estimate the coordinate-wise variance, we introduce a provably consistent subsampling algorithm that leverages the median-of-means approach, empirically achieving similar accuracy to multiplier bootstrap methods while being significantly more computationally efficient. Numerical experiments demonstrate its effectiveness in providing reliable uncertainty estimates with a fraction of the computational cost of existing methods.

Subject: UAI.2025 - Poster


#10 RCAP: Robust, Class-Aware, Probabilistic Dynamic Dataset Pruning [PDF] [Copy] [Kimi] [REL]

Authors: Atif Hassan, Swanand Khare, Jiaul H. Paik

Dynamic data pruning techniques aim to reduce computational cost while minimizing information loss, by periodically selecting representative subsets of input data during model training. However, existing methods often struggle to maintain strong worst-group accuracy, particularly at high pruning rates, across balanced and imbalanced datasets. To address this challenge, we propose RCAP, a Robust, Class-Aware, Probabilistic dynamic dataset pruning algorithm for classification tasks. RCAP applies a closed-form solution to estimate the fraction of samples to be included in the training subset for each individual class. This fraction is adaptively adjusted in every epoch using class-wise aggregated loss. Thereafter, it employs an adaptive sampling strategy that prioritizes samples having high loss for populating the class-wise subsets. We evaluate RCAP on six diverse datasets ranging from class-balanced to highly imbalanced using five distinct models across three training paradigms: training from scratch, transfer learning, and fine-tuning. Our approach consistently outperforms state-of-the-art dataset pruning methods, achieving superior worst-group accuracy at all pruning rates. Remarkably, with only 10% data, RCAP delivers >1% improvement in performance on class-imbalanced datasets compared to full data training, while providing an average 8.69\times speedup. The code can be accessed at https://github.com/atif-hassan/RCAP-dynamic-dataset-pruning

Subject: UAI.2025 - Poster


#11 Approximate Bayesian Inference via Bitstring Representations [PDF] [Copy] [Kimi] [REL]

Authors: Aleksanteri Sladek, Martin Trapp, Arno Solin

The machine learning community has recently put effort into quantized or low-precision arithmetics to scale large models. This paper proposes performing probabilistic inference in the quantized, discrete parameter space created by these representations, effectively enabling us to learn a continuous distribution using discrete parameters. We consider both 2D densities and quantized neural networks, where we introduce a tractable learning approach using probabilistic circuits. This method offers a scalable solution to manage complex distributions and provides clear insights into model behavior. We validate our approach with various models, demonstrating inference efficiency without sacrificing accuracy. This work advances scalable, interpretable machine learning by utilizing discrete approximations for probabilistic computations.

Subject: UAI.2025 - Poster


#12 Multi-armed Bandits with Missing Outcomes [PDF] [Copy] [Kimi] [REL]

Authors: Ilia Mahrooghi, Mahshad Moradi, Sina Akbari, Negar Kiyavash

While significant progress has been made in designing algorithms that minimize regret in online decision-making, real-world scenarios often introduce additional complexities, with missing outcomes perhaps among the most challenging ones. Overlooking this aspect or simply assuming random missingness invariably leads to biased estimates of the rewards and may result in linear regret. Despite the practical relevance of this challenge, no rigorous methodology currently exists for systematically handling missingness, especially when the missingness mechanism is not random. In this paper, we address this gap in the context of multi-armed bandits (MAB) with missing outcomes by analyzing the impact of different missingness mechanisms on achievable regret bounds. We introduce algorithms that account for missingness under both missing at random (MAR) and missing not at random (MNAR) models. Through both analytical and simulation studies, we demonstrate the drastic improvements in decision-making by accounting for missingness in these settings.

Subject: UAI.2025 - Poster


#13 MindFlayer SGD: Efficient Parallel SGD in the Presence of Heterogeneous and Random Worker Compute Times [PDF] [Copy] [Kimi] [REL]

Authors: Arto Maranjyan, Omar Shaikh Omar, Peter Richtárik

We investigate the problem of minimizing the expectation of smooth nonconvex functions in a distributed setting with multiple parallel workers that are able to compute stochastic gradients. A significant challenge in this context is the presence of arbitrarily heterogeneous and stochastic compute times among workers, which can severely degrade the performance of existing parallel stochastic gradient descent (SGD) methods. While some parallel SGD algorithms achieve optimal performance under deterministic but heterogeneous delays, their effectiveness diminishes when compute times are random—a scenario not explicitly addressed in their design. To bridge this gap, we introduce MindFlayer SGD, a novel parallel SGD method specifically designed to handle stochastic and heterogeneous compute times. Through theoretical analysis and empirical evaluation, we demonstrate that MindFlayer SGD consistently outperforms existing baselines, particularly in environments with heavy-tailed noise. Our results highlight its robustness and scalability, making it a compelling choice for large-scale distributed learning tasks.

Subject: UAI.2025 - Poster


#14 Causal Models for Growing Networks [PDF] [Copy] [Kimi] [REL]

Authors: Gecia Bravo-Hermsdorff, Kayvan Sadeghi, Lee M. Gunderson

Real-world networks grow over time; statistical models based on node exchangeability are not appropriate. Instead of constraining the structure of the *distribution* of edges, we propose that the relevant symmetries refer to the *causal structure* between them. We first enumerate the 96 causal directed acyclic graph (DAG) models over pairs of nodes (dyad variables) in a growing network with finite ancestral sets that are invariant to node deletion. We then partition them into 21 classes with ancestral sets that are closed under node marginalization. Several of these classes are remarkably amenable to parallelization. As an example, we highlight a simple model that exhibits flexible power-law degree distributions and emergent phase transitions in sparsity, which we characterize analytically. With few parameters and much conditional independence, our proposed framework provides natural baseline models for causal inference in relational data.

Subject: UAI.2025 - Poster


#15 Offline Changepoint Detection With Gaussian Processes [PDF] [Copy] [Kimi] [REL]

Authors: Janneke Verbeek, Tom Heskes, Yuliya Shapovalova

This work proposes Segmenting changepoint Gaussian process regression (SegCPGP), an offline changepoint detection method that integrates Gaussian process regression with the changepoint kernel, the likelihood ratio test and binary search. We use the spectral mixture kernel to detect various types of changes without prior knowledge of their type. SegCPGP outperforms state-of-the-art methods when detecting various change types in synthetic datasets; in real world changepoint detection datasets, it performs on par with its competitors. While its hypothesis test shows slight miscalibration, we find SegCPGP remains reasonably reliable.

Subject: UAI.2025 - Poster


#16 Partial-Label Learning with Conformal Candidate Cleaning [PDF] [Copy] [Kimi] [REL]

Authors: Tobias Fuchs, Florian Kalinke

Real-world data is often ambiguous; for example, human annotation produces instances with multiple conflicting class labels. Partial-label learning (PLL) aims at training a classifier in this challenging setting, where each instance is associated with a set of candidate labels and one correct, but unknown, class label. A multitude of algorithms targeting this setting exists and, to enhance their prediction quality, several extensions that are applicable across a wide range of PLL methods have been introduced. While many of these extensions rely on heuristics, this article proposes a novel enhancing method that incrementally prunes candidate sets using conformal prediction. To work around the missing labeled validation set, which is typically required for conformal prediction, we propose a strategy that alternates between training a PLL classifier to label the validation set, leveraging these predicted class labels for calibration, and pruning candidate labels that are not part of the resulting conformal sets. In this sense, our method alternates between empirical risk minimization and candidate set pruning. We establish that our pruning method preserves the conformal validity with respect to the unknown ground truth. Our extensive experiments on artificial and real-world data show that the proposed approach significantly improves the test set accuracies of several state-of-the-art PLL classifiers.

Subject: UAI.2025 - Poster


#17 Exploring Exploration in Bayesian Optimization [PDF] [Copy] [Kimi] [REL]

Authors: Leonard Papenmeier, Nuojin Cheng, Stephen Becker, Luigi Nardi

A well-balanced exploration-exploitation trade-off is crucial for successful acquisition functions in Bayesian optimization. However, there is a lack of quantitative measures for exploration, making it difficult to analyze and compare different acquisition functions. This work introduces two novel approaches – observation traveling salesman distance and observation entropy – to quantify the exploration characteristics of acquisition functions based on their selected observations. Using these measures, we examine the explorative nature of several well-known acquisition functions across a diverse set of black-box problems, uncover links between exploration and empirical performance, and reveal new relationships among existing acquisition functions. Beyond enabling a deeper understanding of acquisition functions, these measures also provide a foundation for guiding their design in a more principled and systematic manner.

Subject: UAI.2025 - Poster


#18 Contrast-CAT: Contrasting Activations for Enhanced Interpretability in Transformer-based Text Classifiers [PDF] [Copy] [Kimi] [REL]

Authors: Sungmin Han, Jeonghyun Lee, Sangkyun Lee

Transformers have profoundly influenced AI research, but explaining their decisions remains challenging -- even for relatively simpler tasks such as classification -- which hinders trust and safe deployment in real-world applications. Although activation-based attribution methods effectively explain transformer-based text classification models, our findings reveal that these methods can be undermined by class-irrelevant features within activations, leading to less reliable interpretations. To address this limitation, we propose Contrast-CAT, a novel activation contrast-based attribution method that refines token-level attributions by filtering out class-irrelevant features. By contrasting the activations of an input sequence with reference activations, Contrast-CAT generates clearer and more faithful attribution maps. Experimental results across various datasets and models confirm that Contrast-CAT consistently outperforms state-of-the-art methods. Notably, under the MoRF setting, it achieves average improvements of \times 1.30 in AOPC and \times 2.25 in LOdds over the most competing methods, demonstrating its effectiveness in enhancing interpretability for transformer-based text classification.

Subject: UAI.2025 - Poster


#19 ELF: Federated Langevin Algorithms with Primal, Dual and Bidirectional Compression [PDF] [Copy] [Kimi] [REL]

Authors: Avetik Karagulyan, Peter Richtárik

Federated sampling algorithms have recently gained great popularity in the community of machine learning and statistics. This paper proposes a new federated sampling algorithm called Error Feedback Langevin algorithms (ELF). In particular, we analyze the combinations of EF21 and EF21-P with the federated Langevin Monte-Carlo. We propose three algorithms, P-ELF, D-ELF, and B-ELF, that use primal, dual, and bidirectional compressors. We analyze the proposed methods under Log-Sobolev inequality and provide non-asymptotic convergence guarantees. Simple experimental results support our theoretical findings.

Subject: UAI.2025 - Poster


#20 Fast Non-convex Matrix Sensing with Optimal Sample Complexity [PDF] [Copy] [Kimi1] [REL]

Authors: Jian-Feng Cai, Tong Wu, Ruizhe Xia

We study the problem of recovering an unknown d_1 \times d_2 rank-r matrix from m random linear measurements. Convex methods achieve the optimal sample complexity m = \Omega(r(d_1 + d_2)) but are computationally expensive. Non-convex approaches, while more computationally efficient, often require suboptimal sample complexity m = \Omega(r^2(d_1 + d_2)). Recent advance achieves m = \Omega(rd_1) for a non-convex approach but relies on the restrictive assumption of positive semidefinite (PSD) matrices and suffers from slow convergence in ill-conditioned settings. Bridging this gap, we show that Riemannian gradient descent (RGD) achieves both optimal sample complexity and computational efficiency without requiring the PSD assumption. Specifically, for Gaussian measurements, RGD exactly recovers the low-rank matrix with m = \Omega(r(d_1 + d_2)), matching the information-theoretic lower bound, and converges linearly to the global minimum with an arbitrarily small convergence rate.

Subject: UAI.2025 - Poster


#21 Nonlinear Causal Discovery for Grouped Data [PDF] [Copy] [Kimi] [REL]

Authors: Konstantin Göbler, Tobias Windisch, Mathias Drton

Inferring cause-effect relationships from observational data has gained significant attention in recent years, but most methods are limited to scalar random variables. In many important domains, including neuroscience, psychology, social science, and industrial manufacturing, the causal units of interest are groups of variables rather than individual scalar measurements. Motivated by these applications, we extend nonlinear additive noise models to handle random vectors, establishing a two-step approach for causal graph learning: First, infer the causal order among random vectors. Second, perform model selection to identify the best graph consistent with this order. We introduce effective and novel solutions for both steps in the vector case, demonstrating strong performance in simulations. Finally, we apply our method to real-world assembly line data with partial knowledge of causal ordering among variable groups.

Subject: UAI.2025 - Poster


#22 Error Bounds for Physics-Informed Neural Networks in Fokker-Planck PDEs [PDF] [Copy] [Kimi] [REL]

Authors: Chun-Wei Kong, Luca Laurenti, Jay McMahon, Morteza Lahijanian

Stochastic differential equations are commonly used to describe the evolution of stochastic processes. The state uncertainty of such processes is best represented by the probability density function (PDF), whose evolution is governed by the Fokker-Planck partial differential equation (FP-PDE). However, it is generally infeasible to solve the FP-PDE in closed form. In this work, we show that physics-informed neural networks (PINNs) can be trained to approximate the solution PDF. Our main contribution is the analysis of PINN approximation error: we develop a theoretical framework to construct tight error bounds using PINNs. In addition, we derive a practical error bound that can be efficiently constructed with standard training methods. We discuss that this error-bound framework generalizes to approximate solutions of other linear PDEs. Empirical results on nonlinear, high-dimensional, and chaotic systems validate the correctness of our error bounds while demonstrating the scalability of PINNs and their significant computational speedup in obtaining accurate PDF solutions compared to the Monte Carlo approach.

Subject: UAI.2025 - Poster


#23 On the Privacy Risks of Spiking Neural Networks: A Membership Inference Analysis [PDF1] [Copy] [Kimi] [REL]

Authors: Junyi Guan, Abhijith Sharma, Chong Tian, Salem Lahlou

Spiking Neural Networks (SNNs) are increasingly explored for their energy efficiency and robustness in real-world applications, yet their privacy risks remain largely unexamined. In this work, we investigate the susceptibility of SNNs to Membership Inference Attacks (MIAs)—a major privacy threat where an adversary attempts to determine whether a given sample was part of the training dataset. While prior work suggests that SNNs may offer inherent robustness due to their discrete, event-driven nature, we find that its resilience diminishes as latency (T) increases. Furthermore, we introduce an input dropout strategy under black box setting, that significantly enhances membership inference in SNNs. Our findings challenge the assumption that SNNs are inherently more secure, and even though they are expected to be better, our results reveal that SNNs exhibit privacy vulnerabilities that are equally comparable to Artificial Neural Networks (ANNs).

Subject: UAI.2025 - Poster


#24 InfoDPCCA: Information-Theoretic Dynamic Probabilistic Canonical Correlation Analysis [PDF] [Copy] [Kimi] [REL]

Authors: Shiqin Tang, Shujian Yu

Extracting meaningful latent representations from high-dimensional sequential data is a crucial challenge in machine learning, with applications spanning natural science and engineering. We introduce InfoDPCCA, a dynamic probabilistic Canonical Correlation Analysis (CCA) framework designed to model two interdependent sequences of observations. InfoDPCCA leverages a novel information-theoretic objective to extract a shared latent representation that captures the mutual structure between the data streams and balances representation compression and predictive sufficiency while also learning separate latent components that encode information specific to each sequence. Unlike prior dynamic CCA models, such as DPCCA, our approach explicitly enforces the shared latent space to encode only the mutual information between the sequences, improving interpretability and robustness. We further introduce a two-step training scheme to bridge the gap between information-theoretic representation learning and generative modeling, along with a residual connection mechanism to enhance training stability. Through experiments on synthetic and medical fMRI data, we demonstrate that InfoDPCCA excels as a tool for representation learning. Code of InfoDPCCA is available at https://github.com/marcusstang/InfoDPCCA.

Subject: UAI.2025 - Poster


#25 Hindsight Merging: Diverse Data Generation with Language Models [PDF2] [Copy] [Kimi] [REL]

Authors: Veniamin Veselovsky, Benedikt Stroebl, Gianluca Bencomo, Dilip Arumugam, Lisa Schut, Arvind Narayanan, Thomas L. Griffiths

Pre-training a language model equips it with a broad understanding of the world, while fine- tuning refines it into a helpful assistant. However, fine-tuning does not exclusively enhance task- specific behaviors but also suppresses some of the beneficial variability from pre-training. This reduction in diversity is partly due to the optimization process, which theoretically decreases model entropy in exchange for task performance. To counteract this, we introduce hindsight merging, a technique that combines a fine-tuned model with a previous training checkpoint using linear interpolation to restore entropy and improve performance. Hindsight-merged models retain strong instruction-following capabilities and alignment while displaying increased diversity present in the base model. Additionally, this results in improved inference scaling, achieving a consistent 20-50% increase in pass@10 relative to the instruction tuned model across a coding benchmark and series of models. Our findings suggest that hindsight merging is an effective strategy for generating diverse generations that follow instructions.

Subject: UAI.2025 - Poster