UAI.2023 - Accept

| Total: 188

#1 Exploration for Free: How Does Reward Heterogeneity Improve Regret in Cooperative Multi-agent Bandits? [PDF] [Copy] [Kimi] [REL]

Authors: Xuchuang Wang, Lin Yang, Yu-Zhen Janice Chen, Xutong Liu, Mohammad Hajiesmaili, Don Towsley, John C.S. Lui We study the free exploration mechanism in the multi-agent multi-armed bandits with heterogeneous reward model.

This paper studies a cooperative multi-agent bandit scenario in which the rewards observed by agents are heterogeneous---one agent's meat can be another agent's poison. Specifically, the total reward observed by each agent is the sum of two values: an arm-specific reward, capturing the intrinsic value of the arm, and a privately-known agent-specific reward, which captures the personal preference/limitations of the agent. This heterogeneity in total reward leads to different local optimal arms for agents but creates an opportunity for *free exploration* in a cooperative setting---an agent can freely explore its local optimal arm with no regret and share this free observation with some other agents who would suffer regrets if they pull this arm since the arm is not optimal for them. We first characterize a regret lower bound that captures free exploration, i.e., arms that can be freely explored have no contribution to the regret lower bound. Then, we present a cooperative bandit algorithm that takes advantage of free exploration and achieves a near-optimal regret upper bound which tightly matches the regret lower bound up to a constant factor. Lastly, we run numerical simulations to compare our algorithm with various baselines without free exploration.

Subject: UAI.2023 - Accept


#2 ViBid: Linear Vision Transformer with Bidirectional Normalization [PDF] [Copy] [Kimi] [REL]

Authors: Jeonggeun Song, Heung-Chang Lee We empirically demonstrated the shortcomings of softmax-free and the significance of softmax in attention through BiNorm experiments. Binorm is the simplest adaptation of the current matrix multiplication order-changing algorithms.

The vision transformer has achieved state-of-the-art performance in various vision tasks; however, the memory consumption is larger than those of previous convolutional neural network based models because of O(N^2) time and memory complexity of the general self-attention models. Many approaches aim to change the complexity to O(N) to solve this problem; however, they stack deep convolutional layers to retain locality or complicate the architecture as seen in window attention, to compensate for the performance degradation. To solve these problems, we propose ViBid algorithm, which resolves the complexity problem of O(N^2) by replacing Softmax with bidirectional normalization (BiNorm). In addition, it has a much simpler architecture than the existing transformer model with O(N) complexity. Owing to our simple architecture, we were able to use larger resolutions for training, and we obtained a lighter and superior GPU throughput model with competitive performance. ViBid can be used with any transformer method that uses queries, keys, and values (QKV) because of BiNorm, and it is quite universal due to its simple architectural structure.

Subject: UAI.2023 - Accept


#3 Pessimistic Model Selection for Deep Reinforcement Learning [PDF] [Copy] [Kimi] [REL]

Authors: Chao-Han Huck Yang, Zhengling Qi, Yifan Cui, Pin-Yu Chen A pessimistic model selection approach for offline deep reinforcement with a theoretical guarantee is presented.

Deep Reinforcement Learning (DRL) has demonstrated great potentials in solving sequential decision making problems in many applications. Despite its promising performance, practical gaps exist when deploying DRL in real-world scenarios. One main barrier is the over-fitting issue that leads to poor generalizability of the policy learned by DRL. In particular, for offline DRL with observational data, model selection is a challenging task as there is no ground truth available for performance demonstration, in contrast with the online setting with simulated environments. In this work, we propose a pessimistic model selection (PMS) approach for offline DRL with a theoretical guarantee, which features a tuning-free framework for finding the best policy among a set of candidate models. Two refined approaches are also proposed to address the potential bias of DRL model in identifying the optimal policy. Numerical studies demonstrated the superior performance of our approach over existing methods.

Subject: UAI.2023 - Accept


#4 RDM-DC: Poisoning Resilient Dataset Condensation with Robust Distribution Matching [PDF] [Copy] [Kimi] [REL]

Authors: Tianhang Zheng, Baochun Li

Dataset condensation aims to condense the original training dataset into a small synthetic dataset for data-efficient learning. The recently proposed dataset condensation techniques allow the model trainers with limited resources to learn acceptable deep learning models on a small amount of synthetic data. However, in an adversarial environment, given the original dataset as a poisoned dataset, dataset condensation may encode the poisoning information into the condensed synthetic dataset. To explore the vulnerability of dataset condensation to data poisoning, we revisit the state-of-the-art targeted data poisoning method and customize a targeted data poisoning algorithm for dataset condensation. By executing the two poisoning methods, we demonstrate that, when the synthetic dataset is condensed from a poisoned dataset, the models trained on the synthetic dataset may predict the targeted sample as the attack-targeted label. To defend against data poisoning, we introduce the concept of poisoned deviation to quantify the poisoning effect. We further propose a poisoning-resilient dataset condensation algorithm with a calibration method to reduce poisoned deviation. Extensive evaluations demonstrate that our proposed algorithm can protect the synthetic dataset from data poisoning with minor performance drop.

Subject: UAI.2023 - Accept


#5 Online Estimation of Similarity Matrices with Incomplete Data [PDF] [Copy] [Kimi] [REL]

Authors: Fangchen Yu, Yicheng Zeng, Jianfeng Mao, Wenye Li The paper proposes a series of matrix correction algorithms that estimate similarity matrices with incomplete data streams in different online scenarios.

The similarity matrix measures the pairwise similarities between a set of data points. It is an essential concept in data processing and is routinely used in practical applications. Obtaining the similarity matrix is usually trivial when the data points are completely observed. However, getting a high-quality similarity matrix often turns hard when there are incomplete observations, which becomes even more complex on sequential data streams. To address the challenge, we propose matrix correction algorithms that leverage the positive semi-definiteness of the similarity matrix to provide improved similarity estimation in both offline and online scenarios. Our approaches have a solid theoretical guarantee of performance and excellent potential for parallel execution on large-scale data. They also exhibit high effectiveness and efficiency in empirical evaluations with significantly improved results over the classical imputation-based methods, benefiting downstream applications with superior performance.

Subject: UAI.2023 - Accept


#6 Consistent Emphatic Temporal-Difference Learning [PDF] [Copy] [Kimi] [REL]

Authors: Jiamin He, Fengdi Che, Yi Wan, A. Rupam Mahmood We proposed the first practical consistent off-policy TD algorithm and showed its competitive performance.

Off-policy policy evaluation has been a critical and challenging problem in reinforcement learning, and Temporal-Difference (TD) learning is one of the most important approaches for addressing it. Notably, Full Importance-Sampling TD is the only existing off-policy TD method that is guaranteed to find the on-policy TD fixed point in the linear function approximation setting but, unfortunately, has a high variance and is scarcely practical. This notorious high variance issue motivates the introduction of Emphatic TD, which tames down the variance but has a biased fixed point. Inspired by these two methods, we propose a new consistent algorithm with a transient bias, which strikes a balance between bias and variance. Further, we unify the new algorithm with several existing algorithms and obtain a new family of consistent algorithms called \emph{Consistent Emphatic TD} (CETD(λ, β, ν)), which can control a smooth bias-variance trade-off by varying the speed at which the transient bias fades. Through theoretical analysis and experiments on a didactic example, we validate the consistency of CETD(λ, β, ν). Moreover, we show that CETD(λ, β, ν) converges faster to the lowest error in a complex task with a high variance.

Subject: UAI.2023 - Accept


#7 Learning Choice Functions with Gaussian Processes [PDF] [Copy] [Kimi] [REL]

Authors: Alessio Benavoli, Dario Azzimonti, Dario Piga We develop a Gaussian Process based-method to learn choice functions from choice data via Pareto rationalisation.

In consumer theory, ranking available objects by means of preference relations yields the most common description of individual choices. However, preference-based models assume that individuals: (1) give their preferences only between pairs of objects; (2) are always able to pick the best preferred object. In many situations, they may be instead choosing out of a set with more than two elements and, because of lack of information and/or incomparability (objects with contradictory characteristics), they may not able to select a single most preferred object. To address these situations, we need a choice-model which allows an individual to express a set-valued choice. Choice functions provide such a mathematical framework. We propose a Gaussian Process model to learn choice functions from choice-data. The proposed model assumes a multiple utility representation of a choice function based on the concept of Pareto rationalisation, and derives a strategy to learn both the number and the values of these latent multiple utilities. Simulation experiments demonstrate that the proposed model outperforms the state-of-the-art methods.

Subject: UAI.2023 - Accept


#8 Exploiting Inferential Structure in Neural Processes [PDF] [Copy] [Kimi] [REL]

Authors: Dharmesh Tailor, Mohammad Emtiyaz Khan, Eric Nalisnick

Neural Processes (NPs) are appealing due to their ability to perform fast adaptation based on a context set. This set is encoded by a latent variable, which is often assumed to follow a simple distribution. However, in real-word settings, the context set may be drawn from richer distributions having multiple modes, heavy tails, etc. In this work, we provide a framework that allows NPs’ latent variable to be given a rich prior defined by a graphical model. These distributional assumptions directly translate into an appropriate aggregation strategy for the context set. Moreover, we describe a message-passing procedure that still allows for end-to-end optimization with stochastic gradients. We demonstrate the generality of our framework by using mixture and Student-t assumptions that yield improvements in function modelling and test-time robustness.

Subject: UAI.2023 - Accept


#9 Learning Robust Representation for Reinforcement Learning with Distractions by Reward Sequence Prediction [PDF] [Copy] [Kimi] [REL]

Authors: Qi Zhou, Jie Wang, Qiyuan Liu, Yufei Kuang, Wengang Zhou, Houqiang Li Our method learns robust representations by predicting reward sequences via a novel TD-style algorithm, achieving state-of-the-art sample efficiency and generalization in environments with distractions.

Reinforcement learning algorithms have achieved impressive success in learning behaviors from pixels. However, their application to real-world tasks remains challenging because of their sensitivity to visual distractions (e.g., changes in viewpoint and light). A major reason is that the learned representations often suffer from overfitting task-irrelevant information. By comparing several representation learning methods, we find that the key to robust representation learning is the choice of prediction targets. Therefore, we propose a novel representation learning approach---namely, Reward Sequence Prediction (RSP)---that uses reward sequences or their transforms (e.g., discrete time Fourier transform) as prediction targets. RSP can learn robust representations efficiently because reward sequences rarely contain task-irrelevant information while providing sufficient supervised signals to accelerate representation learning. An appealing feature is that RSP makes no assumption about the type of distractions and thus can improve performance even when multiple types of distractions exist. We evaluate our approach in Distracting Control Suite. Experiments show that our method achieves state-of-the-art sample efficiency and generalization ability in tasks with distractions.

Subject: UAI.2023 - Accept


#10 Information Theoretic Clustering via Divergence Maximization among Cluster Distributions [PDF] [Copy] [Kimi] [REL]

Authors: Sahil Garg, Mina Dalirrooyfard, Anderson Schneider, Yeshaya Adler, Yuriy Nevmyvaka, Yu Chen, Fengpei Li, Guillermo Cecchi

Information-theoretic clustering is one of the most promising and principled approaches to finding clusters with minimal apriori assumptions. The key criterion therein is to maximize the mutual information between the data points and their cluster labels. We instead propose to maximize the Kullback–Leibler divergence between the underlying distributions associated to clusters (referred to as cluster distributions). We show it to be equivalent to optimizing over the mutual information criterion while simultaneously maximizing cross entropy between the cluster distributions. For practical efficiency, we propose to empirically estimate the objective of KL-D between clusters in its dual form leveraging deep neural nets as a dual function approximator. Remarkably, our theoretical analysis establishes that estimating the divergence measure in its dual form simplifies the problem of clustering to one of optimally finding k − 1 cut points for k clusters in the 1-D dual functional space. Overall, our approach enables linear-time clustering algorithms with theoretical guarantees of near-optimality, owing to the submodularity of the objective. We show the empirical superiority of our approach w.r.t. current state-of-the-art methods on the challenging task of clustering noisy timeseries as observed in domains such as neuroscience, healthcare, financial markets, spatio-temporal environmental dynamics, etc.

Subject: UAI.2023 - Accept


#11 In- or Out-of-Distribution Detection via Dual Divergence Estimation [PDF] [Copy] [Kimi] [REL]

Authors: Sahil Garg, Sanghamitra Dutta, Mina Dalirrooyfard, Anderson Schneider, Yuriy Nevmyvaka

Detecting out-of-distribution (OOD) samples is a problem of practical importance for a reliable use of deep neural networks (DNNs) in production settings. The corollary to this problem is the detection in-distribution (ID) samples, which is applicable to domain adaptation scenarios for augmenting a train set with ID samples from other data sets, or to continual learning for replay from the past. For both ID or OOD detection, we propose a principled yet simple approach of (empirically) estimating KL-Divergence, in its dual form, for a given test set w.r.t. a known set of ID samples in order to quantify the contribution of each test sample individually towards the divergence measure and accordingly detect it as OOD or ID. Our approach is compute-efficient and enjoys strong theoretical guarantees. For WideResnet101 and ViT-L-16, by considering ImageNet-1k dataset as the ID benchmark, we evaluate the proposed OOD detector on 51 test (OOD) datasets, and observe drastically and consistently lower false positive rates w.r.t. all the competitive methods. Moreover, the proposed ID detector is evaluated, using ECG and stock price datasets, for the task of data augmentation in domain adaptation and continual learning settings, and we observe higher efficacy compared to relevant baselines.

Subject: UAI.2023 - Accept


#12 On Identifiability of Conditional Causal Effects [PDF] [Copy] [Kimi] [REL]

Authors: Yaroslav Kivva, Jalal Etesami, Negar Kiyavash

We address the problem of identifiability of an arbitrary conditional causal effect given both the causal graph and a set of any observational and/or interventional distributions of the form Q[S]:=P(S|do(VS)), where V denotes the set of all observed variables and SV. We call this problem conditional generalized identifiability (c-gID in short) and prove the completeness of Pearl's do-calculus for the c-gID problem by providing sound and complete algorithm for the c-gID problem. This work revisited the c-gID problem in Lee et al. [2020], Correa et al. [2021] by adding explicitly the positivity assumption which is crucial for identifiability. It extends the results of [Lee et al., 2019, Kivva et al., 2022] on general identifiability (gID) which studied the problem for unconditional causal effects and Shpitser and Pearl [2006b] on identifiability of conditional causal effects given merely the observational distribution P(V) as our algorithm generalizes the algorithms proposed in [Kivva et al., 2022] and [Shpitser and Pearl, 2006b].

Subject: UAI.2023 - Accept


#13 Personalized Federated Domain Adaptation for Item-to-Item Recommendation [PDF] [Copy] [Kimi] [REL]

Authors: Ziwei Fan, Trong Nghia Hoang, HAO DING, Anoop Deoras We propose and investigate a personalized federated modeling framework based on GNNs to summarize, assemble and adapt recommendation patterns across markets with heterogeneous customer behaviors into effective local models

Item-to-Item (I2I) recommendation is an important function in most recommendation systems, which generates replacement or complement suggestions for a particular item based on its semantic similarities to other cataloged items. Given that subsets of items in a recommendation system might be co-interacted with by the same set of customers, graph-based models, such as graph neural networks (GNNs), provide a natural framework to combine, ingest and extract valuable insights from such high-order relational interactions between cataloged items, as well as their metadata features, as has been shown in many recent studies. However, learning GNNs effectively for I2I requires ingesting a large amount of relational data, which might not always be available, especially in new, emerging market segments. To mitigate this data bottleneck, we postulate that recommendation patterns learned from existing mature market segments (with private data) could be adapted to build effective warm-start models for emerging ones. To achieve this, we propose and investigate a personalized federated modeling framework based on GNNs to summarize, assemble and adapt recommendation patterns across market segments with heterogeneous customer behaviors into effective local models. Our key contribution is a personalized graph adaptation model that bridges the gap between recent literature on federated GNNs and (non-graph) personalized federated learning, which either does not optimize for the adaptability of the federated model or is restricted to local models with homogeneous parameterization, excluding GNNs with heterogeneous local graphs. The effectiveness of our framework is demonstrated on a real-world dataset on multiple item categories spanning multiple market segments.

Subject: UAI.2023 - Accept


#14 Memory Mechanism for Unsupervised Anomaly Detection [PDF] [Copy] [Kimi] [REL]

Authors: Jiahao Li, Yiqiang Chen, Yunbing Xing This paper proposed a memory mechanism to enable the model learning to know unknowns.

Unsupervised anomaly detection is a binary classification that detects anomalies in unseen samples given only unlabeled normal data. Reconstruction-based approaches are widely used, which perform reconstruction error minimization on training data to learn normal patterns and quantify the degree of anomalies by reconstruction errors on testing data. However, this approach tends to miss anomalies when the normal data has multi-pattern. Because the model generalizes unrestrictedly beyond normal patterns even to include anomaly patterns. In this paper, we proposed a memory mechanism that memorizes typical normal patterns through a capacity-controlled external differentiable matrix so that the generalization of the model to anomalies is limited by the retrieval of the matrix. We achieved state-of-the-art performance on several public benchmarks.

Subject: UAI.2023 - Accept


#15 Split, Count, and Share: A Differentially Private Set Intersection Cardinality Estimation Protocol [PDF] [Copy] [Kimi] [REL]

Authors: Michael Purcell, Yang Li, Kee Siong Ng We present a simple privacy-preserving protocol for estimating the cardinality of the intersection of two sets.

We describe a simple two-party protocol in which each party contributes a set as input. The output of the protocol is an estimate of the cardinality of the intersection of the two input sets. We show that our protocol is efficient and secure. In particular, we show that the space complexity and communication complexity are constant, the time complexity for each party is proportional to the size of their input set, and that our protocol is differentially private. We also analyze the distribution of the output of the protocol, deriving both its asymptotic distribution and finite-sample bounds on its tail probabilities. These analyses show that, when the input sets are large, our protocol produces accurate set intersection cardinality estimates. As such, we claim that our protocol is an attractive alternative to traditional private set intersection cardinality (PSI-CA) protocols when the input sets are large, exact precision is not required, and differential privacy on its own can provide sufficient protection to the underlying sensitive data.

Subject: UAI.2023 - Accept


#16 How to Use Dropout Correctly on Residual Networks with Batch Normalization [PDF] [Copy] [Kimi] [REL]

Authors: Bum Jun Kim, Hyeyeon Choi, Hyeonah Jang, Donggeon Lee, Sang Woo Kim In this study, we investigate the correct position to apply Dropout.

For the stable optimization of deep neural networks, regularization methods such as dropout and batch normalization have been used in various tasks. Nevertheless, the correct position to apply dropout has rarely been discussed, and different positions have been employed depending on the practitioners. In this study, we investigate the correct position to apply dropout. We demonstrate that for a residual network with batch normalization, applying dropout at certain positions increases the performance, whereas applying dropout at other positions decreases the performance. Based on theoretical analysis, we provide the following guideline for the correct position to apply dropout: apply one dropout after the last batch normalization but before the last weight layer in the residual branch. We provide detailed theoretical explanations to support this claim and demonstrate them through module tests. In addition, we investigate the correct position of dropout in the head that produces the final prediction. Although the current consensus is to apply dropout after global average pooling, we prove that applying dropout before global average pooling leads to a more stable output. The proposed guidelines are validated through experiments using different datasets and models.

Subject: UAI.2023 - Accept


#17 DeepGD3: Unknown-Aware Deep Generative/Discriminative Hybrid Defect Detector for PCB Soldering Inspection [PDF] [Copy] [Kimi] [REL]

Authors: Ching-Wen Ma, Yanwei Liu A generative/discriminative hybrid model effectively address the issue of performance degradation when the test samples come from new components for which no defective sample is available.

This paper presents a novel approach for detecting soldering defects in Printed Circuit Boards (PCBs) composed mainly of Surface Mount Technology (SMT) components, using advanced computer vision and deep learning techniques. The main challenge addressed is the detection of soldering defects in new components for which only examples of good soldering are available at the model training phase. To meet industrial quality standards, we must keep the leakage rate (i.e., miss detection rate) low. To address this, we design the system to be "unknown-aware" with a low unknown rate and utilize the knowledge gained from the soldering examples of old components to detect the soldering defects of new components. We evaluated the method on a real-world dataset from an electronics company. It significantly reduces the leakage rate from 1.827\% ± 3.063\% to 0.063\% ± 0.075\% with an unknown rate of 3.706\% ± 2.270\%, compared to the baseline approach.

Subject: UAI.2023 - Accept


#18 Low-Rank Matrix Recovery with Unknown Correspondence [PDF] [Copy] [Kimi] [REL]

Authors: Zhiwei Tang, Tsung-Hui Chang, Xiaojing Ye, Hongyuan Zha We formulate a new matrix recovery problem for addressing a common obstacle in utilizing heterogeneous data, and develop an efficient algorithm to solve it.

We study a matrix recovery problem with unknown correspondence: given the observation matrix Mo=[A,˜PB], where ˜P is an unknown permutation matrix, we aim to recover the underlying matrix M=[A,B]. Such problem commonly arises in many applications where heterogeneous data are utilized and the correspondence among them are unknown, e.g., due to data mishandling or privacy concern. We show that, in some applications, it is possible to recover M via solving a nuclear norm minimization problem. Moreover, under a proper low-rank condition on M, we derive a non-asymptotic error bound for the recovery of M. We propose an algorithm, M3O (Matrix recovery via Min-Max Optimization) which recasts this combinatorial problem as a continuous minimax optimization problem and solves it by proximal gradient with a Max-Oracle. M3O can also be applied to a more general scenario where we have missing entries in Mo and multiple groups of data with distinct unknown correspondence. Experiments on simulated data, the MovieLens 100K dataset and Yale B database show that M3O achieves state-of-the-art performance over several baselines and can recover the ground-truth correspondence with high accuracy.

Subject: UAI.2023 - Accept


#19 Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector Problems [PDF] [Copy] [Kimi] [REL]

Authors: Chris Junchi Li, Michael Jordan

Motivated by the problem of online canonical correlation analysis, we propose the \emph{Stochastic Scaled-Gradient Descent} (SSGD) algorithm for minimizing the expectation of a stochastic function over a generic Riemannian manifold. SSGD generalizes the idea of projected stochastic gradient descent and allows the use of scaled stochastic gradients instead of stochastic gradients. In the special case of a spherical constraint, which arises in generalized eigenvector problems, we establish a nonasymptotic finite-sample bound of 1/T, and show that this rate is minimax optimal, up to a polylogarithmic factor of relevant parameters. On the asymptotic side, a novel trajectory-averaging argument allows us to achieve local asymptotic normality with a rate that matches that of Ruppert-Polyak-Juditsky averaging. We bring these ideas together in an application to online canonical correlation analysis, deriving, for the first time in the literature, an optimal one-time-scale algorithm with an explicit rate of local asymptotic convergence to normality. Numerical studies of canonical correlation analysis are also provided for synthetic data.

Subject: UAI.2023 - Accept


#20 MDPose: Real-Time Multi-Person Pose Estimation via Mixture Density Model [PDF] [Copy] [Kimi] [REL]

Authors: Seunghyeon Seo, Jaeyoung Yoo, Jihye Hwang, Nojun Kwak We reformulate multi-person pose estimation task as a density estimation, enabling real-time instance-aware keypoint estimation without any additional instance identification process.

One of the major challenges in multi-person pose estimation is instance-aware keypoint estimation. Previous methods address this problem by leveraging an off-the-shelf detector, heuristic post-grouping process or explicit instance identification process, hindering further improvements in the inference speed which is an important factor for practical applications. From the statistical point of view, those additional processes for identifying instances are necessary to bypass learning the high-dimensional joint distribution of human keypoints, which is a critical factor for another major challenge, the occlusion scenario. In this work, we propose a novel framework of single-stage instance-aware pose estimation by modeling the joint distribution of human keypoints with a mixture density model, termed as MDPose. Our MDPose estimates the distribution of human keypoints' coordinates using a mixture density model with an instance-aware keypoint head consisting simply of 8 convolutional layers. It is trained by minimizing the negative log-likelihood of the ground truth keypoints. Also, we propose a simple yet effective training strategy, Random Keypoint Grouping (RKG), which significantly alleviates the underflow problem leading to successful learning of relations between keypoints. On OCHuman dataset, which consists of images with highly occluded people, our MDPose achieves state-of-the-art performance by successfully learning the high-dimensional joint distribution of human keypoints. Furthermore, our MDPose shows significant improvement in inference speed with a competitive accuracy on MS COCO, a widely-used human keypoint dataset, thanks to the proposed much simpler single-stage pipeline.

Subject: UAI.2023 - Accept


#21 Bayesian Numerical Integration with Neural Networks [PDF] [Copy] [Kimi] [REL]

Authors: Katharina Ott, Michael Tiemann, Philipp Hennig, Francois-Xavier Briol We propose a novel architecture for neural networks for numerical integration based on the Stein operator with an approximation of the Bayesian posterior based on the Laplace approximation.

Bayesian probabilistic numerical methods for numerical integration offer significant advantages over their non-Bayesian counterparts: they can encode prior information about the integrand, and can quantify uncertainty over estimates of an integral. However, the most popular algorithm in this class, Bayesian quadrature, is based on Gaussian process models and is therefore associated with a high computational cost. To improve scalability, we propose an alternative approach based on Bayesian neural networks which we call Bayesian Stein networks. The key ingredients are a neural network architecture based on Stein operators, and an approximation of the Bayesian posterior based on the Laplace approximation. We show that this leads to orders of magnitude speed-ups on the popular Genz functions benchmark, and on challenging problems arising in the Bayesian analysis of dynamical systems, and the prediction of energy production for a large-scale wind farm.

Subject: UAI.2023 - Accept


#22 Fast Heterogeneous Federated Learning with Hybrid Client Selection [PDF] [Copy] [Kimi] [REL]

Authors: Duanxiao Song, Guangyuan Shen, Dehong Gao, libin yang, Xukai Zhou, Shirui Pan, Wei Lou, Fang Zhou

Client selection schemes are widely adopted to handle the communication-efficient problems in recent studies of Federated Learning (FL). However, the large variance of the model updates aggregated from the randomly-selected unrepresentative subsets directly slows the FL convergence. We present a novel clustering-based client selection scheme to accelerate the FL convergence by variance reduction. Simple yet effective schemes are designed to improve the clustering effect and control the effect fluctuation, therefore, generating the client subset with certain representativeness of sampling. Theoretically, we demonstrate the improvement of the proposed scheme in variance reduction. We also present the tighter convergence guarantee of the proposed method thanks to the variance reduction. Experimental results confirm the exceed efficiency of our scheme compared to alternatives.

Subject: UAI.2023 - Accept


#23 Implicit Training of Energy Models for Structured Prediction [PDF] [Copy] [Kimi] [REL]

Author: Shiv Shankar

Much research in deep learning is devoted to developing new model and training procedures. On the other hand, training objectives received much less attention and are often restricted to combinations of standard losses. When the objective aligns well with the evaluation metric, this is not a major issue. However when dealing with complex structured outputs, the ideal objective can be hard to optimize and the efficacy of usual objectives as a proxy for the true objective can be questionable. In this work, we argue that the existing inference network based structured prediction methods~\citep{tu-18, tu2020improving} are indirectly learning to optimize a dynamic loss objective parameterized by the energy model. We then explore using implicit-gradient based technique to learn the corresponding dynamic objectives. Our experiments show that implicitly learning a dynamic loss landscape is an effective method for improving model performance in structured prediction.

Subject: UAI.2023 - Accept


#24 Two-Stage Holistic and Contrastive Explanation of Image Classification [PDF] [Copy] [Kimi] [REL]

Authors: Weiyan Xie, Xiao-Hui Li, Zhi LIN, Leonard Poon, Caleb Chen Cao, Nevin L. Zhang We propose a contrastive whole-output explanation method for image classification.

Explanations for the outputs of deep neural network classifiers are essential in promoting trust and comprehension among users. Conventional methods often offer explanations only for one single class in the output and neglect other classes with high probabilities, resulting in a limited view of the model's behaviors. In this paper, we propose a holistic explanation method for image classification. It not only facilitates an overall understanding of model behavior, but also provides a framework where one can examine the evidence for discriminating competing classes, and thereby yield contrastive explanations. We demonstrate the advantages of the new method over baselines in terms of both faithfulness to the model and interpretability to users. The source code will be made available to the public upon publication of the paper.

Subject: UAI.2023 - Accept


#25 Approximate Thompson Sampling via Epistemic Neural Networks [PDF] [Copy] [Kimi] [REL]

Authors: Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy Better joint predictions lead to better decisions in deep RL.

Thompson sampling (TS) is a popular heuristic for action selection, but it requires sampling from a posterior distribution. Unfortunately, this can become computationally intractable in complex environments, such as those modeled using neural networks. Approximate posterior samples can produce effective actions, but only if they reasonably approximate joint predictive distributions of outputs across inputs. Notably, accuracy of marginal predictive distributions does not suffice. Epistemic neural networks (ENNs) are designed to produce accurate joint predictive distributions. We compare a range of ENNs through computational experiments that assess their performance in approximating TS across bandit and reinforcement learning environments. The results indicate that ENNs serve this purpose well and illustrate how the quality of joint predictive distributions drives performance. Further, we demonstrate that the \textit{epinet} --- a small additive network that estimates uncertainty --- matches the performance of large ensembles at orders of magnitude lower computational cost. This enables effective application of TS with computation that scales gracefully to complex environments.

Subject: UAI.2023 - Accept