Processing math: 100%

UAI.2024

| Total: 200

#1 Inference in Probabilistic Answer Set Programs with Imprecise Probabilities via Optimization [PDF¹] [Copy] [Kimi] [REL]

Authors: Damiano Azzolini, Fabrizio Riguzzi

Probabilistic answer set programming has recently been extended to manage imprecise probabilities by means of credal probabilistic facts and credal annotated disjunctions. This increases the expressivity of the language but, at the same time, the cost of inference. In this paper, we cast inference in probabilistic answer set programs with credal probabilistic facts and credal annotated disjunctions as a constrained nonlinear optimization problem where the function to optimize is obtained via knowledge compilation. Empirical results on different datasets with multiple configurations shows the effectiveness of our approach.

Subject: UAI.2024 - Oral

#2 Efficient Monte Carlo Tree Search via On-the-Fly State-Conditioned Action Abstraction [PDF] [Copy] [Kimi] [REL]

Authors: Yunhyeok Kwak, Inwoo Hwang, Dooyoung Kim, Sanghack Lee, Byoung-Tak Zhang

Monte Carlo Tree Search (MCTS) has showcased its efficacy across a broad spectrum of decision-making problems. However, its performance often degrades under vast combinatorial action space, especially where an action is composed of multiple sub-actions. In this work, we propose an action abstraction based on the compositional structure between a state and sub-actions for improving the efficiency of MCTS under a factored action space. Our method learns a latent dynamics model with an auxiliary network that captures sub-actions relevant to the transition on the current state, which we call state-conditioned action abstraction. Notably, it infers such compositional relationships from high-dimensional observations without the known environment model. During the tree traversal, our method constructs the state-conditioned action abstraction for each node on-the-fly, reducing the search space by discarding the exploration of redundant sub-actions. Experimental results demonstrate the superior sample efficiency of our method compared to vanilla MuZero, which suffers from expansive action space.

Subject: UAI.2024 - Oral

#3 Towards Minimax Optimality of Model-based Robust Reinforcement Learning [PDF¹] [Copy] [Kimi] [REL]

Authors: Pierre Clavier, Erwan Le Pennec, Matthieu Geist

We study the sample complexity of obtaining an $\epsilon$ -optimal policy in \emph{Robust} discounted Markov Decision Processes (RMDPs), given only access to a generative model of the nominal kernel. This problem is widely studied in the non-robust case, and it is known that any planning approach applied to an empirical MDP estimated with $\tilde{\mathcal{O}}(\frac{H^3 |S||A|}{\epsilon^2})$ samples provides an $\epsilon$ -optimal policy, which is minimax optimal. Results in the robust case are much more scarce. For $sa$ - (resp $s$ -) rectangular uncertainty sets, until recently the best-known sample complexity was $\tilde{\mathcal{O}}(\frac{H^4 |S|^2|A|}{\epsilon^2})$ (resp. $\tilde{\mathcal{O}}(\frac{H^4 | S |^2| A |^2}{\epsilon^2})$ ), for specific algorithms and when the uncertainty set is based on the total variation (TV), the KL or the Chi-square divergences. In this paper, we consider uncertainty sets defined with an $L_p$ -ball (recovering the TV case), and study the sample complexity of \emph{any} planning algorithm (with high accuracy guarantee on the solution) applied to an empirical RMDP estimated using the generative model. In the general case, we prove a sample complexity of $\tilde{\mathcal{O}}(\frac{H^4 | S || A |}{\epsilon^2})$ for both the $sa$ - and $s$ -rectangular cases (improvements of $| S |$ and $| S || A |$ respectively). When the size of the uncertainty is small enough, we improve the sample complexity to $\tilde{\mathcal{O}}(\frac{H^3 | S || A | }{\epsilon^2})$ , recovering the lower-bound for the non-robust case for the first time and a robust lower-bound. Finally, we also introduce simple and efficient algorithms for solving the studied $L_p$ robust MDPs.

Subject: UAI.2024 - Oral

#4 Inference for Optimal Linear Treatment Regimes in Personalized Decision-making [PDF] [Copy] [Kimi] [REL]

Authors: Yuwen Cheng, Shu Yang

Personalized decision-making, tailored to individual characteristics, is gaining significant attention. The optimal treatment regime aims to provide the best-expected outcome in the entire population, known as the value function. One approach to determine this optimal regime is by maximizing the Augmented Inverse Probability Weighting (AIPW) estimator of the value function. However, the derived treatment regime can be intricate and nonlinear, limiting their use. For clarity and interoperability, we emphasize linear regimes and determine the optimal linear regime by optimizing the AIPW estimator within set constraints. While the AIPW estimator offers a viable path to estimating the optimal regime, current methodologies predominantly focus on its asymptotic distribution, leaving a gap in studying the linear regime itself. However, there are many benefits to understanding the regime, as pinpointing significant covariates can enhance treatment effects and provide future clinical guidance. In this paper, we explore the asymptotic distribution of the estimated linear regime. Our results show that the parameter associated with the linear regime follows a cube-root convergence to a non-normal limiting distribution characterized by the maximizer of a centered Gaussian process with a quadratic drift. When making inferences for the estimated linear regimes with cube-root convergence in practical scenarios, the standard nonparametric bootstrap is invalid. As a solution, we facilitate the Cattaneo et al. (2020) bootstrap technique to provide a consistent distributional approximation for the estimated linear regimes, validated further through simulations and real-world data applications from the eICU Collaborative Research Database.

Subject: UAI.2024 - Oral

#5 Towards Bounding Causal Effects under Markov Equivalence [PDF] [Copy] [Kimi] [REL]

Author: Alexis Bellot

Predicting the effect of unseen interventions is a fundamental research question across the data sciences. It is well established that in general such questions cannot be answered definitively from observational data. This realization has fuelled a growing literature introducing various identifying assumptions, for example in the form of a causal diagram among relevant variables. In practice, this paradigm is still too rigid for many practical applications as it is generally not possible to confidently delineate the true causal diagram. In this paper, we consider the derivation of bounds on causal effects given only observational data. We propose to take as input a less informative structure known as a Partial Ancestral Graph, which represents a Markov equivalence class of causal diagrams and is learnable from data. In this more ``data-driven'' setting, we provide a systematic algorithm to derive bounds on causal effects that exploit the invariant properties of the equivalence class, and that can be computed analytically. We demonstrate our method with synthetic and real data examples.

Subject: UAI.2024 - Oral

#6 Optimistic Regret Bounds for Online Learning in Adversarial Markov Decision Processes [PDF] [Copy] [Kimi] [REL]

Authors: Sang Bin Moon, Abolfazl Hashemi

The Adversarial Markov Decision Process (AMDP) is a learning framework that deals with unknown and varying tasks in decision-making applications like robotics and recommendation systems. A major limitation of the AMDP formalism, however, is pessimistic regret analysis results in the sense that although the cost function can change from one episode to the next, the evolution in many settings is not adversarial. To address this, we introduce and study a new variant of AMDP, which aims to minimize regret while utilizing a set of cost predictors. For this setting, we develop a new policy search method that achieves a sublinear optimistic regret with high probability, that is a regret bound which gracefully degrades with the estimation power of the cost predictors. Establishing such optimistic regret bounds is nontrivial given that (i) as we demonstrate, the existing importance-weighted cost estimators cannot establish optimistic bounds, and (ii) the feedback model of AMDP is different (and more realistic) than the existing optimistic online learning works. Our result, in particular, hinges upon developing a novel optimistically biased cost estimator that leverages cost predictors and enables a high-probability regret analysis without imposing restrictive assumptions. We further discuss practical extensions of the proposed scheme and demonstrate its efficacy numerically.

Subject: UAI.2024 - Oral

#7 Cost-Sensitive Uncertainty-Based Failure Recognition for Object Detection [PDF¹] [Copy] [Kimi] [REL]

Authors: Moussa Kassem Sbeyti, Michelle E. Karg, Christian Wirth, Nadja Klein, Sahin Albayrak

Object detectors in real-world applications often fail to detect objects due to varying factors such as weather conditions and noisy input. Therefore, a process that mitigates false detections is crucial for both safety and accuracy. While uncertainty-based thresholding shows promise, previous works demonstrate an imperfect correlation between uncertainty and detection errors. This hinders ideal thresholding, prompting us to further investigate the correlation and associated cost with different types of uncertainty. We therefore propose a cost-sensitive framework for object detection tailored to user-defined budgets on the two types of errors, missing and false detections. We derive minimum thresholding requirements to prevent performance degradation and define metrics to assess the applicability of uncertainty for failure recognition. Furthermore, we automate and optimize the thresholding process to maximize the failure recognition rate w.r.t. the specified budget. Evaluation on three autonomous driving datasets demonstrates that our approach significantly enhances safety, particularly in challenging scenarios. Leveraging localization aleatoric uncertainty and softmax-based entropy only, our method boosts the failure recognition rate by 36-60\% compared to conventional approaches. Code is available at https://mos-ks.github.io/publications.

Subject: UAI.2024 - Oral

#8 On the Capacitated Facility Location Problem with Scarce Resources [PDF] [Copy] [Kimi] [REL]

Authors: Gennaro Auricchio, Harry J. Clough, Jie Zhang

This paper investigates the Mechanism Design aspects of the $m$ -Capacitated Facility Location Problem where the total facility capacity is lower than the number of agents. Following \cite{aziz2020capacity}, the Social Welfare of the facility location is determined through a First-Come-First-Served (FCFS) game where agents compete after the facility positions are established. When the number of facilities is $m>1$ , the Nash Equilibrium (NE) of the FCFS game is not unique, thus the utility of the agents and the notion of truthfulness are not well-defined. To address these issues, we consider absolutely truthful mechanisms, i.e. mechanisms able to prevent agents from misreporting regardless of the strategies played during the FCFS game. We pair this more stringent truthfulness requirement with the notion of Equilibrium Stable (ES) mechanism, i.e. mechanisms whose Social Welfare does not depend on the NE of the FCFS game. We show that the class of percentile mechanisms is absolutely truthful and characterize under which conditions they are ES. We then show that the approximation ratio of each ES percentile mechanism is bounded and determine its value. Notably, when all the facilities have the same capacity and the number of agents is large enough, it is possible to achieve an approximation ratio smaller than $1+\frac{1}{2m-1}$ . We enhance our findings by empirically evaluating the mechanisms' performances when agents' true positions follows a distribution.

Subject: UAI.2024 - Oral

#9 Identification and Estimation of Conditional Average Partial Causal Effects via Instrumental Variable [PDF] [Copy] [Kimi] [REL]

Authors: Yuta Kawakami, manabu kuroki, Jin Tian

There has been considerable recent interest in estimating heterogeneous causal effects. In this paper, we study conditional average partial causal effects (CAPCE) to reveal the heterogeneity of causal effects with continuous treatment. We provide conditions for identifying CAPCE in an instrumental variable setting. Notably, CAPCE is identifiable under a weaker assumption than required by a commonly used measure for estimating heterogeneous causal effects of continuous treatment. We develop three families of CAPCE estimators: sieve, parametric, and reproducing kernel Hilbert space (RKHS)-based, and analyze their statistical properties. We illustrate the proposed CAPCE estimators on synthetic and real-world data.

Subject: UAI.2024 - Oral

#10 Functional Wasserstein Bridge Inference for Bayesian Deep Learning [PDF] [Copy] [Kimi] [REL]

Authors: Mengjing Wu, Junyu Xuan, Jie Lu

Bayesian deep learning (BDL) is an emerging field that combines the strong function approximation power of deep learning with the uncertainty modeling capabilities of Bayesian methods. In addition to those virtues, however, there are accompanying issues brought by such a combination to the classical parameter-space variational inference, such as the nonmeaningful priors, intricate posteriors, and possible pathologies. In this paper, we propose a new function-space variational inference solution called Functional Wasserstein Bridge Inference (FWBI), which can assign meaningful functional priors and obtain well-behaved posterior. Specifically, we develop a Wasserstein distance-based bridge to avoid the potential pathological behaviors of Kullback–Leibler (KL) divergence between stochastic processes that arise in most existing functional variational inference approaches. The derived functional variational objective is well-defined and proved to be a lower bound of the model evidence. We demonstrate the improved predictive performance and better uncertainty quantification of our FWBI on several tasks compared with various parameter-space and function-space variational methods.

Subject: UAI.2024 - Oral

#11 Reflected Schrödinger Bridge for Constrained Generative Modeling [PDF] [Copy] [Kimi¹] [REL]

Authors: Wei Deng, Yu Chen, Nicole Tianjiao Yang, Hengrong Du, Qi Feng, Ricky T. Q. Chen

Diffusion models have become the go-to method for large-scale generative models in real-world applications. These applications often involve data distributions confined within bounded domains, typically requiring ad-hoc thresholding techniques for boundary enforcement. Reflected diffusion models aim to enhance generalizability by generating the data distribution through a backward process governed by reflected Brownian motion. However, reflected diffusion models may not easily adapt to diverse domains without the derivation of proper diffeomorphic mappings and do not guarantee optimal transport properties. To overcome these limitations, we introduce the Reflected Schrödinger Bridge algorithm—an entropy-regularized optimal transport approach tailored for generating data within diverse bounded domains. We derive elegant reflected forward-backward stochastic differential equations with Neumann and Robin boundary conditions, extend divergence-based likelihood training to bounded domains, and explore natural connections to entropic optimal transport for the study of approximate linear convergence—a valuable insight for practical training. Our algorithm yields robust generative modeling in diverse domains, and its scalability is demonstrated in real-world constrained generative modeling through standard image benchmarks.

Subject: UAI.2024 - Oral

#12 Statistical and Causal Robustness for Causal Null Hypothesis Tests [PDF] [Copy] [Kimi] [REL]

Authors: Junhui Yang, Rohit Bhattacharya, Youjin Lee, Ted Westling

Prior work applying semiparametric theory to causal inference has primarily focused on deriving estimators that exhibit statistical robustness under a prespecified causal model that permits identification of a desired causal parameter. However, a fundamental challenge is correct specification of such a model, which usually involves making untestable assumptions. Evidence factors is an approach to combining hypothesis tests of a common causal null hypothesis under two or more candidate causal models. Under certain conditions, this yields a test that is valid if at least one of the underlying models is correct, which is a form of causal robustness. We propose a method of combining semiparametric theory with evidence factors. We develop a causal null hypothesis test based on joint asymptotic normality of $K$ asymptotically linear semiparametric estimators, where each estimator is based on a distinct identifying functional derived from each of $K$ candidate causal models. We show that this test provides both statistical and causal robustness in the sense that it is valid if at least one of the $K$ proposed causal models is correct, while also allowing for slower than parametric rates of convergence in estimating nuisance functions. We demonstrate the effectiveness of our method via simulations and applications to the Framingham Heart Study and Wisconsin Longitudinal Study.

Subject: UAI.2024 - Oral

#13 Pix2Code: Learning to Compose Neural Visual Concepts as Programs [PDF] [Copy] [Kimi] [REL]

Authors: Antonia Wüst, Wolfgang Stammer, Quentin Delfosse, Devendra Singh Dhami, Kristian Kersting

The challenge in learning abstract concepts from images in an unsupervised fashion lies in the required integration of visual perception and generalizable relational reasoning. Moreover, the unsupervised nature of this task makes it necessary for human users to be able to understand a model's learned concepts and potentially revise false behaviors. To tackle both the generalizability and interpretability constraints of visual concept learning, we propose Pix2Code, a framework that extends program synthesis to visual relational reasoning by utilizing the abilities of both explicit, compositional symbolic and implicit neural representations. This is achieved by retrieving object representations from images and synthesizing relational concepts as $\lambda$ -calculus programs. We evaluate the diverse properties of Pix2Code on the challenging reasoning domains, Kandinsky Patterns, and CURI, testing its ability to identify compositional visual concepts that generalize to novel data and concept configurations. Particularly, in stark contrast to neural approaches, we show that Pix2Code's representations remain human interpretable and can easily be revised for improved performance.

Subject: UAI.2024 - Oral

#14 Targeted Reduction of Causal Models [PDF] [Copy] [Kimi] [REL]

Authors: Armin Kekić, Bernhard Schölkopf, Michel Besserve

Why does a phenomenon occur? Addressing this question is central to most scientific inquiries and often relies on simulations of scientific models. As models become more intricate, deciphering the causes behind phenomena in high-dimensional spaces of interconnected variables becomes increasingly challenging. Causal Representation Learning (CRL) offers a promising avenue to uncover interpretable causal patterns within these simulations through an interventional lens. However, developing general CRL frameworks suitable for practical applications remains an open challenge. We introduce _Targeted Causal Reduction_ (TCR), a method for condensing complex intervenable models into a concise set of causal factors that explain a specific target phenomenon. We propose an information theoretic objective to learn TCR from interventional data of simulations, establish identifiability for continuous variables under shift interventions and present a practical algorithm for learning TCRs. Its ability to generate interpretable high-level explanations from complex models is demonstrated on toy and mechanical systems, illustrating its potential to assist scientists in the study of complex phenomena in a broad range of disciplines.

Subject: UAI.2024 - Oral

#15 Normalizing Flows for Conformal Regression [PDF] [Copy] [Kimi] [REL]

Author: Nicolò Colombo

Conformal Prediction (CP) algorithms estimate the uncertainty of a prediction model by calibrating its outputs on labeled data. The same calibration scheme usually applies to any model and data without modifications. The obtained prediction intervals are valid by construction but could be inefficient, i.e. unnecessarily big, if the prediction errors are not uniformly distributed over the input space. We present a general scheme to localize the intervals by training the calibration process. The standard prediction error is replaced by an optimized distance metric that depends explicitly on the object attributes. Learning the optimal metric is equivalent to training a Normalizing Flow that acts on the joint distribution of the errors and the inputs. Unlike the Error Re-weighting CP algorithm of Papadopoulos et al. (2008), the framework allows estimating the gap between nominal and empirical conditional validity. The approach is compatible with existing locally-adaptive CP strategies based on re-weighting the calibration samples and applies to any point-prediction model without retraining.

Subject: UAI.2024 - Oral

#16 Characterising Interventions in Causal Games [PDF] [Copy] [Kimi] [REL]

Authors: Manuj Mishra, James Fox, Michael J. Wooldridge

Causal games are probabilistic graphical models that enable causal queries to be answered in multi-agent settings. They extend causal Bayesian networks by specifying decision and utility variables to represent the agents' degrees of freedom and objectives. In multi-agent settings, whether each agent decides on their policy before or after knowing the causal intervention is important as this affects whether they can respond to the intervention by adapting their policy. Consequently, previous work in causal games imposed chronological constraints on permissible interventions. We relax this by outlining a sound and complete set of primitive causal interventions so the effect of any arbitrarily complex interventional query can be studied in multi-agent settings. We also demonstrate applications to the design of safe AI systems by considering causal mechanism design and commitment.

Subject: UAI.2024 - Oral

#17 Generalized Expected Utility as a Universal Decision Rule -- A Step Forward [PDF] [Copy] [Kimi] [REL]

Authors: Hélène Fargier, Pierre Pomeret-Coquot

In order to capture a larger range of decision rules, this paper extends the seminal work of [Friedman and Halpern, 1995, Chu and Halpern, 2003, 2004] about Generalized Expected Utility. We introduce the notion of algebraic mass function (and of algebraic Möbius transform) and provide a new algebraic expression for expected utility based on such functions. This utility, that we call "XEU", generalizes Chu and Halpern’s GEU to non-decomposable measures and allows for the representation of several rules that could not be captured up to this point, and noticeably, of the Choquet integral. A representation theorem is provided that shows that only a very weak condition is needed for a rule in order to be representable as a XEU.

Subject: UAI.2024 - Oral

#18 Learning Accurate and Interpretable Decision Trees [PDF] [Copy] [Kimi] [REL]

Authors: Maria Florina Balcan, Dravyansh Sharma

Decision trees are a popular tool in machine learning and yield easy-to-understand models. Several techniques have been proposed in the literature for learning a decision tree classifier, with different techniques working well for data from different domains. In this work, we develop approaches to design decision tree learning algorithms given repeated access to data from the same domain. We propose novel parameterized classes of node splitting criteria in top-down algorithms, which interpolate between popularly used entropy and Gini impurity based criteria, and provide theoretical bounds on the number of samples needed to learn the splitting function appropriate for the data at hand. We also study the sample complexity of tuning prior parameters in Bayesian decision tree learning, and extend our results to decision tree regression. We further consider the problem of tuning hyperparameters in pruning the decision tree for classical pruning algorithms including min-cost complexity pruning. We also study the interpretability of the learned decision trees and introduce a data-driven approach for optimizing the explainability versus accuracy trade-off using decision trees. Finally, we demonstrate the significance of our approach on real world datasets by learning data-specific decision trees which are simultaneously more accurate and interpretable.

Subject: UAI.2024 - Oral

#19 Group Fairness in Predict-Then-Optimize Settings for Restless Bandits [PDF] [Copy] [Kimi] [REL]

Authors: Shresth Verma, YUNFAN ZHAO, Sanket Shah, Niclas Boehmer, Aparna Taneja, Milind Tambe

Restless multi-arm bandits (RMABs) are a model for sequentially allocating a limited number of resources to agents modeled as Markov Decision Processes. RMABs have applications in cellular networks, anti-poaching, and in particular, healthcare. For such high-stakes use cases, allocations are often required to treat different groups of agents (e.g., defined by sensitive attributes) fairly. In addition to the fairness challenge, agents' transition probabilities are often unknown and need to be learned in real-world problems. Thus, group fairness in RMABs requires us to simultaneously learn transition probabilities and how much budget we allocate to each group. Overcoming this key challenge ignored by previous work, we develop a decision-focused-learning pipeline to solve equitable RMABs, using a novel budget allocation algorithm to prevent disparity between groups. Our results on both synthetic and real-world large-scale datasets demonstrate that incorporating fair planning into the learning step greatly improves equity with little sacrifice in utility.

Subject: UAI.2024 - Oral

#20 Causally Abstracted Multi-armed Bandits [PDF] [Copy] [Kimi] [REL]

Authors: Fabio Massimo Zennaro, Nicholas George Bishop, Joel Dyer, Yorgos Felekis, Ani Calinescu, Michael J. Wooldridge, Theodoros Damoulas

Multi-armed bandits (MAB) and causal MABs (CMAB) are established frameworks for decision-making problems. The majority of prior work typically studies and solves individual MAB and CMAB in isolation for a given problem and associated data. However, decision-makers are often faced with multiple related problems and multi-scale observations where joint formulations are needed in order to efficiently exploit the problem structures and data dependencies. Transfer learning for CMABs addresses the situation where models are defined on identical variables, although causal connections may differ. In this work, we extend transfer learning to setups involving CMABs defined on potentially different variables, with varying degrees of granularity, and related via an abstraction map. Formally, we introduce the problem of causally abstracted MABs (CAMABs) by relying on the theory of causal abstraction in order to express a rigorous abstraction map. We propose algorithms to learn in a CAMAB, and study their regret. We illustrate the limitations and the strengths of our algorithms on a real-world scenario related to online advertising.

Subject: UAI.2024 - Oral

#21 A Graph Theoretic Approach for Preference Learning with Feature Information [PDF] [Copy] [Kimi] [REL]

Authors: Aadirupa Saha, Arun Rajkumar

We consider the problem of ranking a set of $n$ items given a sample of their pairwise preferences. It is well known from the classical results of sorting literature that without any further assumption, one requires a sample size of $\Omega(n \log n)$ with active selection of pairs whereas, for a random set pairwise preferences the bound could be as bad as $\Omega(n^2)$ . However, what if the learner is exposed to additional knowledge of the items features and their pairwise preferences are known to be modelled in terms of their feature similarities -- can these bounds be improved? In particular, we introduce a new probabilistic preference model, called feature-Bradley-Terry-Luce (f-BTL) for the purpose, and present a new least squares based algorithm, fBTL-LS, which requires a sample complexity much lesser than $O(n\log n)$ random pairs to obtain a `good' ranking. The sample complexity of our proposed algorithms depends on the degree of feature correlation of the items that makes use of tools from classical graph matching theory, shedding light on the true complexity of the problem -- this was not possible before with existing matrix completion based tools. We also prove tightness of our results showing a matching information theoretic lower bound for the problem. Our theoretical results are corroborated with extensive experimental evaluations on varying datasets.

Subject: UAI.2024 - Oral

#22 Bayesian Active Learning in the Presence of Nuisance Parameters [PDF¹] [Copy] [Kimi] [REL]

Authors: Sabina J. Sloman, Ayush Bharti, Julien Martinelli, Samuel Kaski

In many settings, such as scientific inference, optimization, and transfer learning, the learner has a well-defined objective, which can be treated as estimation of a target parameter, and no intrinsic interest in characterizing the entire data-generating process. Usually, the learner must also contend with additional sources of uncertainty or variables --- with nuisance parameters. Bayesian active learning, or sequential optimal experimental design, can straightforwardly accommodate the presence of nuisance parameters, and so is a natural active learning framework for such problems. However, the introduction of nuisance parameters can lead to bias in the Bayesian learner's estimate of the target parameters, a phenomenon we refer to as negative interference. We characterize the threat of negative interference and how it fundamentally changes the nature of the Bayesian active learner's task. We show that the extent of negative interference can be extremely large, and that accurate estimation of the nuisance parameters is critical to reducing it. The Bayesian active learner is confronted with a dilemma: whether to spend a finite acquisition budget in pursuit of estimation of the target or of the nuisance parameters. Our setting encompasses Bayesian transfer learning as a special case, and our results shed light on the phenomenon of negative transfer between learning environments.

Subject: UAI.2024 - Oral

#23 Understanding Pathologies of Deep Heteroskedastic Regression [PDF] [Copy] [Kimi] [REL]

Authors: Eliot Wong-Toi, Alex James Boyd, Vincent Fortuin, Stephan Mandt

Deep, overparameterized regression models are notorious for their tendency to overfit. This problem is exacerbated in heteroskedastic models, which predict both mean and residual noise for each data point. At one extreme, these models fit all training data perfectly, eliminating residual noise entirely; at the other, they overfit the residual noise while predicting a constant, uninformative mean. We observe a lack of middle ground, suggesting a phase transition dependent on model regularization strength. Empirical verification supports this conjecture by fitting numerous models with varying mean and variance regularization. To explain the transition, we develop a theoretical framework based on a statistical field theory, yielding qualitative agreement with experiments. As a practical consequence, our analysis simplifies hyperparameter tuning from a two-dimensional to a one-dimensional search, substantially reducing the computational burden. Experiments on diverse datasets, including UCI datasets and the large-scale ClimSim climate dataset, demonstrate significantly improved performance in various calibration tasks.

Subject: UAI.2024 - Oral

#24 Support Recovery in Sparse PCA with General Missing Data [PDF] [Copy] [Kimi] [REL]

Authors: Hanbyul Lee, Qifan Song, Jean Honorio

We analyze a sparse PCA algorithm for incomplete and noisy data without any specific model assumption on the data missing scheme. We utilize a graphical approach to characterize general missing patterns, which enables us to analyze the effect of structural properties of missing patterns on the solvability of sparse PCA problem. The sparse PCA method we focus on is a semidefinite relaxation of the $\ell_1$ -regularized PCA problem. We provide theoretical justification that the support of the sparse leading eigenvector can be recovered with high probability using the algorithm, under certain conditions. The conditions involve the spectral gap between the largest and second-largest eigenvalues of the true data matrix, the magnitude of the noise, and the structural properties of the missing pattern. The concepts of algebraic connectivity and irregularity are used to describe the properties in a graphical way. We empirically justify our theorem with synthetic data analysis. We show that the SDP algorithm outperforms other sparse PCA approaches especially when the observation pattern has good structural properties. As a by-product of our analysis, we provide two theorems to handle general missing schemes, which can be applied to other problems related to incomplete data matrices.

Subject: UAI.2024 - Oral

#25 A General Identification Algorithm For Data Fusion Problems Under Systematic Selection [PDF] [Copy] [Kimi] [REL]

Authors: Jaron J.R. Lee, AmirEmad Ghassami, Ilya Shpitser

Causal inference is made challenging by confounding, selection bias, and other complications. A common approach to addressing these difficulties is the inclusion of auxiliary data on the superpopulation of interest. Such data may measure a different set of variables, or be obtained under different experimental conditions than the primary dataset. Analysis based on multiple datasets must carefully account for similarities between datasets, while appropriately accounting for differences. In addition, selection of experimental units into different datasets may be systematic; similar difficulties are encountered in missing data problems. Existing methods for combining datasets either do not consider this issue, or assume simple selection mechanisms. In this paper, we provide a general approach, based on graphical causal models, for causal inference from data on the same superpopulation that is obtained under different experimental conditions. Our framework allows both arbitrary unobserved confounding, and arbitrary selection processes into different experimental regimes in our data. We describe how systematic selection processes may be organized into a hierarchy similar to censoring processes in missing data: selected completely at random (SCAR), selected at random (SAR), and selected not at random (SNAR). In addition, we provide a general identification algorithm for interventional distributions in this setting.

Subject: UAI.2024 - Oral