Processing math: 100%

UAI.2023

| Total: 243

#1 Towards Physically Reliable Molecular Representation Learning [PDF] [Copy] [Kimi] [REL]

Authors: Seunghoon Yi, Youngwoo Cho, Jinhwan Sul, Seung Woo Ko, Soo Kyung Kim, Jaegul Choo, Hongkee Yoon, Joonseok Lee We propose a physics-driven molecular representation learning method powered by self-supervised masked atomic modeling, and novel evaluation schemes to ensure reliability of the model in various ways.

Estimating the energetic properties of molecular systems is a critical task in material design. Machine learning has shown remarkable promise on this task over classical force-fields, but a fully data-driven approach suffers from limited labeled data; not just the amount of available data lacks, but the distribution of labeled examples is highly skewed to stable states. In this work, we propose a molecular representation learning method that extrapolates well beyond the training distribution, powered by physics-driven parameter estimation from classical energy equations and self-supervised learning inspired from masked language modeling. To ensure reliability of the proposed model, we introduce a series of novel evaluation schemes in multifaceted ways, beyond the energy or force accuracy that has been dominantly used. From extensive experiments, we demonstrate that the proposed method is effective in discovering molecular structures, outperforming other baselines. Furthermore, we extrapolate it to the chemical reaction pathways beyond stable states, taking a step towards physically reliable molecular representation learning.

Subject: UAI.2023 - Oral


#2 Probabilistic Circuits That Know What They Don't Know [PDF] [Copy] [Kimi] [REL]

Authors: Fabrizio Ventola, Steven Braun, Zhongjie Yu, Martin Mundt, Kristian Kersting We show that probabilistic circuits can be overconfident and not robust to out-of-distribution data, and we overcome this challenge by introducing a tractable sampling-free inference procedure to estimate model uncertainty.

Probabilistic circuits (PCs) are models that allow exact and tractable probabilistic inference. In contrast to neural networks, they are often assumed to be well-calibrated and robust to out-of-distribution (OOD) data. In this paper, we show that PCs are in fact not robust to OOD data, i.e., they don't know what they don't know. We then show how this challenge can be overcome by model uncertainty quantification. To this end, we propose tractable dropout inference (TDI), an inference procedure to estimate uncertainty by deriving an analytical solution to Monte Carlo dropout (MCD) through variance propagation. Unlike MCD in neural networks, which comes at the cost of multiple network evaluations, TDI provides tractable sampling-free uncertainty estimates in a single forward pass. TDI improves the robustness of PCs to distribution shift and OOD data, demonstrated through a series of experiments evaluating the classification confidence and uncertainty estimates on real-world data.

Subject: UAI.2023 - Oral


#3 MixupE: Understanding and Improving Mixup from Directional Derivative Perspective [PDF] [Copy] [Kimi] [REL]

Authors: Yingtian Zou, Vikas Verma, Sarthak Mittal, Wai Hoh Tang, Hieu Pham, Juho Kannala, Yoshua Bengio, Arno Solin, Kenji Kawaguchi We propose a theory-driven improvement of Mixup, which is theoretically and empirically validated to be effective.

Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels. This technique is known to improve the generalization performance in many learning paradigms and applications. In this work, we first analyze Mixup and show that it implicitly regularizes infinitely many directional derivatives of all orders. Based on this new insight, we propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup. To demonstrate the effectiveness of the proposed method, we conduct experiments across various domains such as images, tabular data, speech, and graphs. Our results show that the proposed method improves Mixup across various datasets using a variety of architectures, for instance, exhibiting an improvement over Mixup by 0.8% in ImageNet top-1 accuracy.

Subject: UAI.2023 - Oral


#4 The Shrinkage-Delinkage Trade-off: An Analysis of Factorized Gaussian Approximations for Variational Inference [PDF] [Copy] [Kimi] [REL]

Authors: Charles Margossian, Lawrence K. Saul We examine the uncertainty deficit of Variational Inference when using a factorized Gaussian approximation.

When factorized approximations are used for variational inference (VI), they tend to understimate the uncertainty---as measured in various ways---of the distributions they are meant to approximate. We consider two popular ways to measure the uncertainty deficit of VI: (i) the degree to which it underestimates the componentwise variance, and (ii) the degree to which it underestimates the entropy. To better understand these effects, and the relationship between them, we examine an informative setting where they can be explicitly (and elegantly) analyzed: the approximation of a Gaussian,~p, with a dense covariance matrix, by a Gaussian,~q, with a diagonal covariance matrix. We prove that q always underestimates both the componentwise variance and the entropy of p, \textit{though not necessarily to the same degree}. Moreover we demonstrate that the entropy of q is determined by the trade-off of two competing forces: it is decreased by the shrinkage of its componentwise variances (our first measure of uncertainty) but it is increased by the factorized approximation which delinks the nodes in the graphical model of p. We study various manifestations of this trade-off, notably one where, as the dimension of the problem grows, the per-component entropy gap between p and q becomes vanishingly small even though q underestimates every componentwise variance by a constant multiplicative factor. We also use the shrinkage-delinkage trade-off to bound the entropy gap in terms of the problem dimension and the condition number of the correlation matrix of p. Finally we present empirical results on both Gaussian and non-Gaussian targets, the former to validate our analysis and the latter to explore its limitations.

Subject: UAI.2023 - Oral


#5 Neural Probabilistic Logic Programming in Discrete-Continuous Domains [PDF] [Copy] [Kimi] [REL]

Authors: Lennert De Smet, Pedro Zuidberg Dos Martires, Robin Manhaeve, Giuseppe Marra, Angelika Kimmig, Luc De Raedt DeepSeaProbLog: a neural probabilistic logic programming language with discrete and continuous random variables.

Neural-symbolic AI (NeSy) allows neural networks to exploit symbolic background knowledge in the form of logic. It has been shown to aid learning in the limited data regime and to facilitate inference on out-of-distribution data. Probabilistic NeSy focuses on integrating neural networks with both logic and probability theory, which additionally allows learning under uncertainty. A major limitation of current probabilistic NeSy systems, such as DeepProbLog, is their restriction to finite probability distributions, i.e., discrete random vari- ables. In contrast, deep probabilistic programming (DPP) excels in modelling and optimising continuous probability distributions. Hence, we introduce DeepSeaProbLog, a neural probabilistic logic programming language that incorporates DPP techniques into NeSy. Doing so results in the support of inference and learning of both discrete and continuous probability distributions under logical constraints. Our main contributions are 1) the semantics of DeepSeaProbLog and its corresponding inference algorithm, 2) a proven asymptotically unbiased learning algorithm, and 3) a series of experiments that illustrate the versatility of our approach.

Subject: UAI.2023 - Oral


#6 Partial Identification of Dose Responses with Hidden Confounders [PDF] [Copy] [Kimi] [REL]

Authors: Myrl G Marmarelis, Greg Ver Steeg, Andrew Jesson, Elizabeth Haddad, Neda Jahanshad, Aram Galstyan We bound the estimated causal effects of continuous-valued treatments when they might be biased by hidden confounders.

Inferring causal effects of continuous-valued treatments from observational data is a crucial task promising to better inform policy- and decision-makers. A critical assumption needed to identify these effects is that all confounding variables---causal parents of both the treatment and the outcome---are included as covariates. Unfortunately, given observational data alone, we cannot know with certainty that this criterion is satisfied. Sensitivity analyses provide principled ways to give bounds on causal estimates when confounding variables are hidden. While much attention is focused on sensitivity analyses for discrete-valued treatments, much less is paid to continuous-valued treatments. We present novel methodology to bound both average and conditional average continuous-valued treatment-effect estimates when they cannot be point identified due to hidden confounding. A semi-synthetic benchmark on multiple datasets shows our method giving tighter coverage of the true dose-response curve than a recently proposed continuous sensitivity model and baselines. Finally, we apply our method to a real-world observational case study to demonstrate the value of identifying dose-dependent causal effects.

Subject: UAI.2023 - Oral


#7 Human-in-the-Loop Mixup [PDF] [Copy] [Kimi] [REL]

Authors: Katherine M. Collins, Umang Bhatt, Weiyang Liu, Vihari Piratla, Ilia Sucholutsky, Bradley C. Love, Adrian Weller Synthetic labels used in mixup are not consistently aligned with human perceptual judgments; relabeling examples, with humans-in-the-loop and leveraging human uncertainty information, holds promise to increase downstream model reliability.

Aligning model representations to humans has been found to improve robustness and generalization. However, such methods often focus on standard observational data. Synthetic data is proliferating and powering many advances in machine learning; yet, it is not always clear whether synthetic labels are perceptually aligned to humans -- rendering it likely model representations are not human aligned. We focus on the synthetic data used in mixup: a powerful regularizer shown to improve model robustness, generalization, and calibration. We design a comprehensive series of elicitation interfaces, which we release as HILL MixE Suite, and recruit 159 participants to provide perceptual judgments along with their uncertainties, over mixup examples. We find that human perceptions do not consistently align with the labels traditionally used for synthetic points, and begin to demonstrate the applicability of these findings to potentially increase the reliability of downstream models, particularly when incorporating human uncertainty. We release all elicited judgments in a new data hub we call H-Mix.

Subject: UAI.2023 - Oral


#8 Revisiting Bayesian Network Learning with Small Vertex Cover [PDF] [Copy] [Kimi] [REL]

Authors: Juha Harviainen, Mikko Koivisto We present new algorithms for learning, sampling and counting Bayesian networks parameterized by the vertex cover number.

The problem of structure learning in Bayesian networks asks for a directed acyclic graph (DAG) that maximizes a given scoring function. Since the problem is NP-hard, research effort has been put into discovering restricted classes of DAGs for which the search problem can be solved in polynomial time. Here, we initiate investigation of questions that have received less attention thus far: Are the known polynomial algorithms close to the best possible, or is there room for significant improvements? If the interest is in Bayesian learning, that is, in sampling or weighted counting of DAGs, can we obtain similar complexity results? Focusing on DAGs with bounded vertex cover number–a class studied in Korhonen and Parviainen's seminal work (NIPS 2015)–we answer the questions in the affirmative. We also give, apparently the first, proof that the counting problem is #P-hard in general. In addition, we show that under the vertex-cover constraint counting is #W[1]-hard.

Subject: UAI.2023 - Oral


#9 On Inference and Learning With Probabilistic Generating Circuits [PDF] [Copy] [Kimi] [REL]

Authors: Juha Harviainen, Vaidyanathan Peruvemba Ramaswamy, Mikko Koivisto We present faster inference algorithms for probabilistic generating circuits and study the hardness of parameter learning.

Probabilistic generating circuits (PGCs) are economical representations of multivariate probability generating polynomials (PGPs). They unify and extend decomposable probabilistic circuits and determinantal point processes, admitting tractable computation of marginal probabilities. However, the need for addition and multiplication of high-degree polynomials incurs a significant additional factor in the complexity of inference. Here, we give a new inference algorithm that eliminates this extra factor. Specifically, we show that it suffices to keep track of the highest degree coefficients of the computed polynomials, rendering the algorithm linear in the circuit size. In addition, we show that determinant-based circuits need not be expanded to division-free circuits, but can be handled by division-based fast algorithms. While these advances enhance the appeal of PGCs, we also discover an obstacle to learning them from data: it is NP-hard to recognize whether a given PGC encodes a PGP. We discuss the implications of our ambivalent findings and sketch a method, in which learning is restricted to PGCs that are composed of moderate-size subcircuits.

Subject: UAI.2023 - Oral


#10 Quantifying Aleatoric and Epistemic Uncertainty in Machine Learning: Are Conditional Entropy and Mutual Information Appropriate Measures? [PDF] [Copy] [Kimi] [REL]

Authors: Lisa Wimmer, Yusuf Sale, Paul Hofman, Bernd Bischl, Eyke Hüllermeier

The quantification of aleatoric and epistemic uncertainty in terms of conditional entropy and mutual information, respectively, has recently become quite common in machine learning. While the properties of these measures, which are rooted in information theory, seem appealing at first glance, we identify various incoherencies that call their appropriateness into question. In addition to the measures themselves, we critically discuss the idea of an additive decomposition of total uncertainty into its aleatoric and epistemic constituents. Experiments across different computer vision tasks support our theoretical findings and raise concerns about current practice in uncertainty quantification.

Subject: UAI.2023 - Oral


#11 Provably Efficient Adversarial Imitation Learning with Unknown Transitions [PDF] [Copy] [Kimi] [REL]

Authors: Tian Xu, Ziniu Li, Yang Yu, Zhi-Quan Luo We theoretically explore adversarial imitation learning with unknown transitions.

The process of learning good policies from expert demonstrations, known as imitation learning (IL), has been proven effective in many applications. Adversarial imitation learning (AIL), a subset of IL methods, is particularly promising, but its theoretical foundation in the presence of unknown transitions has yet to be fully developed. This paper explores the theoretical underpinnings of AIL in this context, where the primary challenge is the stochastic and uncertain nature of environment transitions. We examine the expert sample complexity and interaction complexity required to recover good policies, which are of great practical interest. To this end, we establish a framework connecting reward-free exploration and AIL, and propose an algorithm, MB-TAIL, that achieves the minimax optimal expert sample complexity of ˜O(H3/2|S|/ε) and interaction complexity of ˜O(H3|S|2|A|/ε2). Here, H represents the planning horizon, |S| is the state space size, |A| is the action space size, and ε is the desired imitation gap. MB-TAIL is the first algorithm to achieve this level of expert sample complexity in the unknown transition setting and improves upon the interaction complexity of the best-known algorithm, OAL, by O(H). Additionally, we demonstrate the generalization ability of MB-TAIL by extending it to the function approximation setting and proving that it can achieve expert sample and interaction complexity independent of |S|.

Subject: UAI.2023 - Oral


#12 An Improved Variational Approximate Posterior for the Deep Wishart Process [PDF] [Copy] [Kimi] [REL]

Authors: Sebastian W. Ober, Ben Anson, Edward Milsom, Laurence Aitchison

Deep kernel processes are a recently introduced class of deep Bayesian models that have the flexibility of neural networks, but work entirely with Gram matrices. They operate by alternately sampling a Gram matrix from a distribution over positive semi-definite matrices, and applying a deterministic transformation. When the distribution is chosen to be Wishart, the model is called a deep Wishart process (DWP). This particular model is of interest because its prior is equivalent to a deep Gaussian process (DGP) prior, but at the same time it is invariant to rotational symmetries, leading to a simpler posterior distribution. Practical inference in the DWP was made possible in recent work ("A variational approximate posterior for the deep Wishart process" Ober and Aitchison, 2021a) where the authors used a generalisation of the Bartlett decomposition of the Wishart distribution as the variational approximate posterior. However, predictive performance in that paper was less impressive than one might expect, with the DWP only beating a DGP on a few of the UCI datasets used for comparison. In this paper, we show that further generalising their distribution to allow linear combinations of rows and columns in the Bartlett decomposition results in better predictive performance, while incurring negligible additional computation cost.

Subject: UAI.2023 - Oral


#13 Local Message Passing on Frustrated Systems [PDF] [Copy] [Kimi] [REL]

Authors: Luca Schmid, Joshua Brenk, Laurent Schmalen This work proposes a novel method to derive efficient message passing algorithms for approximate inference on graphs with many cycles.

Message passing on factor graphs is a powerful framework for probabilistic inference, which finds important applications in various scientific domains. The most wide-spread message passing scheme is the sum-product algorithm (SPA) which gives exact results on trees but often fails on graphs with many small cycles. We search for an alternative message passing algorithm which works particularly well on such cyclic graphs. Therefore, we challenge the extrinsic principle of the SPA, which loses its purpose on graphs with cycles. We further replace the local SPA message update rule at the factor nodes of the underlying graph with a generic mapping, which is optimized in a data-driven fashion. These modifications lead to a considerable improvement of the performance, while preserving the simplicity of the SPA. We evaluate our method for two classes of cyclic graphs: the 2x2 fully connected Ising grid and factor graphs for symbol detection on linear communication channels with inter-symbol interference. To enable the method for large graphs as they occur in practical applications, we develop a novel loss function which is inspired by the Bethe approximation from statistical physics and allows for training in an unsupervised fashion.

Subject: UAI.2023 - Oral


#14 Learning from Low Rank Tensor Data: A Random Tensor Theory Perspective [PDF1] [Copy] [Kimi1] [REL]

Authors: Mohamed El Amine Seddik, Malik Tiomoko, Alexis Decurninge, Maxime Guillaud, Maxim Panov

Under a simplified data model, this paper provides a theoretical analysis of learning from data that have an underlying low-rank tensor structure in both supervised and unsupervised settings. For the supervised setting, we provide an analysis of a Ridge classifier (with high regularization parameter) with and without knowledge of the low-rank structure of the data. Our results quantify analytically the gain in misclassification errors achieved by exploiting the low-rank structure for denoising purposes, as opposed to treating data as mere vectors. We further provide a similar analysis in the context of clustering, thereby quantifying the exact performance gap between tensor methods and standard approaches which treat data as simple vectors.

Subjects: UAI.2023 - Oral, UAI.2023 - Accept


#15 Meta-learning Control Variates: Variance Reduction with Limited Data [PDF] [Copy] [Kimi] [REL]

Authors: Zhuo Sun, Chris J. Oates, Francois-Xavier Briol

Control variates can be a powerful tool to reduce the variance of Monte Carlo estimators, but constructing effective control variates can be challenging when the number of samples is small. In this paper, we show that when a large number of related integrals need to be computed, it is possible to leverage the similarity between these integration tasks to improve performance even when the number of samples per task is very small. Our approach, called meta learning CVs (Meta-CVs), can be used for up to hundreds or thousands of tasks. Our empirical assessment indicates that Meta-CVs can lead to significant variance reduction in such settings, and our theoretical analysis establishes general conditions under which Meta-CVs can be successfully trained.

Subject: UAI.2023 - Oral


#16 Is the Volume of a Credal Set a Good Measure for Epistemic Uncertainty? [PDF] [Copy] [Kimi] [REL]

Authors: Yusuf Sale, Michele Caprio, Eyke Hüllermeier We show that the volume of a credal set is a good measure for epistemic uncertainty in a binary classification setting, while it ceases to be so in multi-class setting.

Adequate uncertainty representation and quantification have become imperative in various scientific disciplines, especially in machine learning and artificial intelligence. As an alternative to representing uncertainty via one single probability measure, we consider credal sets (convex sets of probability measures). The geometric representation of credal sets as d-dimensional polytopes implies a geometric intuition about (epistemic) uncertainty. In this paper, we show that in the case of binary classification, the volume of the geometric representation of a credal set is a good measure of epistemic uncertainty, while for multi-class classification it ceases to be appealing. Our theoretical findings highlight the crucial role of specifying and employing appropriate measures of uncertainty in machine learning tasks and generally call for awareness of possible pitfalls.

Subject: UAI.2023 - Oral


#17 Functional Causal Bayesian Optimization [PDF] [Copy] [Kimi] [REL]

Authors: Limor Gultchin, Virginia Aglietti, Alexis Bellot, Silvia Chiappa We propose the functional causal Bayesian optimization method for finding functional interventions that optimize a target variable in a known causal graph.

We propose the functional causal Bayesian optimization method (fCBO) for finding interventions that optimize a target variable in a known causal graph. fCBO extends CBO to perform, in addition to hard interventions, functional interventions which consist in setting a variable to be a deterministic function of a set of other variables in the graph. This is achieved by modelling the unknown objective with Gaussian processes whose inputs are defined in a reproducing kernel Hilbert space, thus allowing to compute distances among vector-valued functions. In turn, this enables to sequentially select functions to explore by maximizing an expected improvement acquisition functional while keeping the typical computational tractability of standard BO settings. We show that functional interventions can attain better target effects compared to hard interventions and ensure that the found optimal policy is also optimal for sub-groups. We demonstrate the benefits of the method in a synthetic setting and in a real-world causal graph.

Subject: UAI.2023 - Oral


#18 Establishing Markov Equivalence in Cyclic Directed Graphs [PDF] [Copy] [Kimi] [REL]

Authors: Tom Claassen, Joris Mooij A new ancestral perspective on the Cyclic Equivalence Theorem leads to a simplified and much faster procedure to decide on Markov equivalence between directed cyclic graphs.

We present a new, efficient procedure to establish Markov equivalence between directed graphs that may or may not contain cycles. It is based on the Cyclic Equivalence Theorem (CET) in the seminal works on cyclic models by Thomas Richardson in the mid '90s, but now rephrased from an ancestral perspective. The resulting characterization leads to a procedure for establishing Markov equivalence between graphs that no longer requires explicit tests for \textit{d}-separation, leading to a significantly reduced algorithmic complexity. The conceptually simplified characterization may help to reinvigorate theoretical research towards sound and complete cyclic discovery in the presence of latent confounders.

Subject: UAI.2023 - Oral


#19 On Minimizing the Impact of Dataset Shifts on Actionable Explanations [PDF] [Copy] [Kimi] [REL]

Authors: Anna P. Meyer, Dan Ley, Suraj Srinivas, Himabindu Lakkaraju

The Right to Explanation is an important regulatory principle which allows individuals to request actionable explanations for algorithmic decisions. However, several technical challenges arise when providing such actionable explanations in practice. For instance, models are periodically retrained to handle dataset shifts, and this may in turn invalidate some of the previously prescribed explanations thus rendering them unactionable. But, it is unclear if and when such invalidations occur, and what factors determine explanation stability i.e., if an explanation remains unchanged amidst model retraining due to dataset shifts. In this paper, we address the aforementioned gaps and provide one of the first theoretical and empirical characterizations of the factors influencing explanation stability. To this end, we conduct rigorous theoretical analysis to demonstrate that model curvature and robustness, weight decay parameters, and the magnitude of the dataset shift are key factors that determine the extent of explanation (in)stability. Extensive experimentation with real-world datasets not only validates our theoretical results, but also demonstrates that the aforementioned factors dramatically impact the stability of explanations produced by various state-of-the-art methods.

Subject: UAI.2023 - Oral


#20 Probabilistic Flow Circuits: Towards Unified Deep Models for Tractable Probabilistic Inference [PDF] [Copy] [Kimi] [REL]

Authors: Sahil Sidheekh, Kristian Kersting, Sriraam Natarajan A principled approach to building expressive and tractable generative models by integrating normalizing flows with probabilistic circuits.

We consider the problem of increasing the expressivity of probabilistic circuits by augmenting them with the successful generative models of normalizing flows. To this effect, we theoretically establish the requirement of decomposability for such combinations to retain tractability of the learned models. Our model, called Probabilistic Flow Circuits, essentially extends circuits by allowing for normalizing flows at the leaves. Our empirical evaluation clearly establishes the expressivity and tractability of this new class of probabilistic circuits.

Subject: UAI.2023 - Oral


#21 On Testability and Goodness of Fit Tests in Missing Data Models [PDF] [Copy] [Kimi] [REL]

Authors: Razieh Nabi, Rohit Bhattacharya

Significant progress has been made in developing identification and estimation techniques for missing data problems where modeling assumptions can be described via a directed acyclic graph. The validity of results using such techniques rely on the assumptions encoded by the graph holding true; however, verification of these assumptions has not received sufficient attention in prior work. In this paper, we provide new insights on the testable implications of three broad classes of missing data graphical models, and design goodness-of-fit tests for them. The classes of models explored are: sequential missing-at-random and missing-not-at-random models which can be used for modeling longitudinal studies with dropout/censoring, and a no self-censoring model which can be applied to cross-sectional studies and surveys.

Subject: UAI.2023 - Oral


#22 Keep-Alive Caching for the Hawkes process [PDF] [Copy] [Kimi] [REL]

Authors: Sushirdeep Narayana, Ian A. Kash We study the design of caching policies in applications such as serverless computing where there is not a fixed size cache to be filled, but rather there is a cost associated with the time an item stays in the cache.

We study the design of caching policies in applications such as serverless computing where there is not a fixed size cache to be filled, but rather there is a cost associated with the time an item stays in the cache. We present a model for such caching policies which captures the trade-off between this cost and the cost of cache misses. We characterize optimal caching policies in general and apply this characterization by deriving a closed form for Hawkes processes. Since optimal policies for Hawkes processes depend on the history of arrivals, we also develop history-independent policies which achieve near-optimal average performance. We evaluate the performances of the optimal policy and approximate polices using simulations and a data trace of Azure Functions, Microsoft's FaaS (Function as a Service) platform for serverless computing.

Subject: UAI.2023 - Oral


#23 Parity Calibration [PDF] [Copy] [Kimi] [REL]

Authors: Youngseog Chung, Aaron Rumack, Chirag Gupta We connect calibration in regression and classification with the notion of parity (whether the next observation increases or decreases w.r.t. the current observation), and propose methods to produce parity calibrated predictions.

In a sequential prediction setting, a decision-maker may be primarily concerned with whether the continuous-valued future observation will increase or decrease compared to the current one, rather than the actual value of the future observation. We introduce the parity calibration framework, where the goal is to provide calibrated uncertainty estimates for the increase-decrease event in a timeseries. While these ``parity" probabilities can be extracted from a distributional forecast, such a strategy does not work as expected and can have poor practical performance. We then observe that although the original task was regression, parity calibration can be expressed as binary calibration. Drawing on this connection, we use a recently proposed online binary calibration method to achieve parity calibration. We demonstrate the effectiveness of our method on real-world case studies in epidemiology, weather forecasting, and model-based control in nuclear fusion.

Subject: UAI.2023 - Oral


#24 Conditional Abstraction Trees for Sample-Efficient Reinforcement Learning [PDF] [Copy] [Kimi] [REL]

Authors: Mehdi Dadvar, Rashmeet Kaur Nayyar, Siddharth Srivastava

In many real-world problems, the learning agent needs to learn a problem’s abstractions and solution simultaneously. However, most such abstractions need to be designed and refined by hand for different problems and domains of application. This paper presents a novel top-down approach for constructing state abstractions while carrying out reinforcement learning (RL). Starting with state variables and a simulator, it presents a novel domain-independent approach for dynamically computing an abstraction based on the dispersion of Q-values in abstract states as the agent continues acting and learning. Extensive empirical evaluation on multiple domains and problems shows that this approach automatically learns abstractions that are finely-tuned to the problem, yield powerful sample efficiency, and result in the RL agent significantly outperforming existing approaches.

Subject: UAI.2023 - Oral


#25 Inference and Sampling of Point Processes from Diffusion Excursions [PDF] [Copy] [Kimi] [REL]

Authors: Ali Hasan, Yu Chen, Yuting Ng, Mohamed Abdelghani, Anderson Schneider, Vahid Tarokh We develop methods to represent point processes in terms of excursions of a diffusion.

Point processes often have a natural interpretation with respect to a continuous process. We propose a point process construction that describes arrival time observations in terms of the state of a latent diffusion process. In this framework, we relate the return time of diffusion in a continuous path space to new arrivals of the point process. These models arise in many disciplines, such as financial settings where actions in a market are determined by a hidden continuous price or in neuroscience where a latent stimulus generates spike trains. Based on the developments in Itô's excursion theory, we describe computational methods for inferring and sampling from the point process derived from the diffusion process. We provide numerical examples for the proposed method using both simulated and real data to illustrate the approach. The proposed methods and framework provide a basis for interpreting point processes through the lens of a diffusion.

Subject: UAI.2023 - Spotlight