UAI.2024 - Accept

| Total: 161

#1 Learning Topological Representations with Bidirectional Graph Attention Network for Solving Job Shop Scheduling Problem [PDF] [Copy] [Kimi] [REL]

Authors: Cong Zhang, Zhiguang Cao, Yaoxin Wu, Wen Song, Jing Sun

Existing learning-based methods for solving job shop scheduling problems (JSSP) usually use off-the-shelf GNN models tailored to undirected graphs and neglect the rich and meaningful topological structures of disjunctive graphs (DGs). This paper proposes the topology-aware bidirectional graph attention network (TBGAT), a novel GNN architecture based on the attention mechanism, to embed the DG for solving JSSP in a local search framework. Specifically, TBGAT embeds the DG from a forward and a backward view, respectively, where the messages are propagated by following the different topologies of the views and aggregated via graph attention. Then, we propose a novel operator based on the message-passing mechanism to calculate the forward and backward topological sorts of the DG, which are the features for characterizing the topological structures and exploited by our model. In addition, we theoretically and experimentally show that TBGAT has linear computational complexity to the number of jobs and machines, respectively, strengthening our method's practical value. Besides, extensive experiments on five synthetic datasets and seven classic benchmarks show that TBGAT achieves new SOTA results by outperforming a wide range of neural methods by a large margin. All the code and data are publicly available online at https://github.com/zcaicaros/TBGAT.

Subject: UAI.2024 - Accept

#2 RE-SORT: Removing Spurious Correlation in Multilevel Interaction for CTR Prediction [PDF] [Copy] [Kimi] [REL]

Authors: Songli Wu, Liang Du, Jia-Qi Yang, Yuai Wang, De-Chuan Zhan, SHUANG ZHAO, Zixun Sun

Click-through rate (CTR) prediction is a critical task in recommendation systems, serving as the ultimate filtering step to sort items for a user. Most recent cutting-edge methods primarily focus on investigating complex implicit and explicit feature interactions; however, these methods neglect the spurious correlation issue caused by confounding factors, thereby diminishing the model's generalization ability. We propose a CTR prediction framework that REmoves Spurious cORrelations in mulTilevel feature interactions, termed RE-SORT, which has two key components. I. A multilevel stacked recurrent (MSR) structure enables the model to efficiently capture diverse nonlinear interactions from feature spaces at different levels. II. A spurious correlation elimination (SCE) module further leverages Laplacian kernel mapping and sample reweighting methods to eliminate the spurious correlations concealed within the multilevel features, allowing the model to focus on the true causal features. Extensive experiments conducted on four challenging CTR datasets, our production dataset, and an online A/B test demonstrate that the proposed method achieves state-of-the-art performance in both accuracy and speed. The utilized codes, models, and dataset will be released at https://github.com/RE-SORT.

Subject: UAI.2024 - Accept

#3 DistriBlock: Identifying adversarial audio samples by leveraging characteristics of the output distribution [PDF] [Copy] [Kimi] [REL]

Authors: Matias Patricio Pizarro Bustamante, Dorothea Kolossa, Asja Fischer

Adversarial attacks can mislead automatic speech recognition (ASR) systems into predicting an arbitrary target text, thus posing a clear security threat. To prevent such attacks, we propose DistriBlock, an efficient detection strategy applicable to any ASR system that predicts a probability distribution over output tokens in each time step. We measure a set of characteristAdversarial attacks can mislead automatic speech recognition (ASR) systems into predicting an arbitrary target text, thus posing a clear security threat. To prevent such attacks, we propose DistriBlock, an efficient detection strategy applicable to any ASR system that predicts a probability distribution over output tokens in each time step. We measure a set of characteristics of this distribution: the median, maximum, and minimum over the output probabilities, the entropy of the distribution, as well as the Kullback-Leibler and the Jensen-Shannon divergence with respect to the distributions of the subsequent time step. Then, by leveraging the characteristics observed for both benign and adversarial data, we apply binary classifiers, including simple threshold-based classification, ensembles of such classifiers, and neural networks. Through extensive analysis across different state-of-the-art ASR systems and language data sets, we demonstrate the supreme performance of this approach, with a mean area under the receiver operating characteristic curve for distinguishing target adversarial examples against clean and noisy data of 99\% and 97\%, respectively. To assess the robustness of our method, we show that adaptive adversarial examples that can circumvent DistriBlock are much noisier, which makes them easier to detect through filtering and creates another avenue for preserving the system's robustness.ics of this distribution: the median, maximum, and minimum over the output probabilities, the entropy of the distribution, as well as the Kullback-Leibler and the Jensen-Shannon divergence with respect to the distributions of the subsequent time step. Then, by leveraging the characteristics observed for both benign and adversarial data, we apply binary classifiers, including simple threshold-based classification, ensembles of such classifiers, and neural networks. Through extensive analysis across different state-of-the-art ASR systems and language data sets, we demonstrate the supreme performance of this approach, with a mean area under the receiver operating characteristic for distinguishing target adversarial examples against clean and noisy data of 99\% and 97\%, respectively. To assess the robustness of our method, we show that adaptive adversarial examples that can circumvent DistriBlock are much noisier, which makes them easier to detect through filtering and creates another avenue for preserving the system's robustness.

Subject: UAI.2024 - Accept

#4 Consistency Regularization for Domain Generalization with Logit Attribution Matching [PDF] [Copy] [Kimi] [REL]

Authors: Han Gao, Kaican Li, Weiyan Xie, Zhi LIN, Yongxiang Huang, Luning Wang, Caleb Chen Cao, Nevin L. Zhang

Domain generalization (DG) is about training models that generalize well under domain shift. Previous research on DG has been conducted mostly in single-source or multi-source settings. In this paper, we consider a third lesser-known setting where a training domain is endowed with a collection of pairs of examples that share the same semantic information. Such semantic sharing (SS) pairs can be created via data augmentation and then utilized for consistency regularization (CR). We present a theory showing CR is conducive to DG and propose a novel CR method called Logit Attribution Matching (LAM). We conduct experiments on five DG benchmarks and four pretrained models with SS pairs created by both generic and targeted data augmentation methods. LAM outperforms representative single/multi-source DG methods and various CR methods that leverage SS pairs. The code and data of this project are available at https://github.com/Gaohan123/LAM.

Subject: UAI.2024 - Accept

#5 Decentralized Online Learning in General-Sum Stackelberg Games [PDF] [Copy] [Kimi] [REL]

Authors: Yaolong Yu, Haipeng Chen

We study an online learning problem in general-sum Stackelberg games, where players act in a decentralized and strategic manner. We study two settings depending on the type of information for the follower: (1) the $\textit{limited information}$ setting where the follower only observes its own reward, and (2) the $\textit{side information}$ setting where the follower has extra side information about the leader's reward. We show that for the follower, myopically best responding to the leader's action is the best strategy for the limited information setting, but not necessarily so for the side information setting -- the follower can manipulate the leader's reward signals with strategic actions, and hence induce the leader's strategy to converge to an equilibrium that is better off for itself. Based on these insights, we study decentralized online learning for both players in the two settings. Our main contribution is to derive $\textit{last iterate}$ convergence and sample complexity results in both settings. Notably, we design a new manipulation strategy for the follower in the latter setting, and show that it has an intrinsic advantage against the best response strategy. Our theories are also supported by empirical results.

Subject: UAI.2024 - Accept

#6 Low-rank Matrix Bandits with Heavy-tailed Rewards [PDF] [Copy] [Kimi] [REL]

Authors: Yue Kang, Cho-Jui Hsieh, Thomas Lee

In stochastic low-rank matrix bandit, the expected reward of an arm is equal to the inner product between its feature matrix and some unknown $d_1$ by $d_2$ low-rank parameter matrix $\Theta^*$ with rank $r \ll d_1\wedge d_2$ . While all prior studies assume the payoffs are mixed with sub-Gaussian noises, in this work we loosen this strict assumption and consider the new problem of low-rank matrix bandit with heavy-tailed rewards (LowHTR), where the rewards only have finite $(1+\delta)$ moment for some $\delta \in (0,1]$ . By utilizing the truncation on observed payoffs and the dynamic exploration, we propose a novel algorithm called LOTUS attaining the regret bound of order $\tilde O(d^\frac{3}{2}r^\frac{1}{2}T^\frac{1}{1+\delta}/\tilde{D}_{rr})$ without knowing $T$ , which matches the state-of-the-art regret bound under sub-Gaussian noises~\citep{lu2021low,kang2022efficient} with $\delta = 1$ . Moreover, we establish a lower bound of the order $\Omega(d^\frac{\delta}{1+\delta} r^\frac{\delta}{1+\delta} T^\frac{1}{1+\delta}) = \Omega(T^\frac{1}{1+\delta})$ for LowHTR, which indicates our LOTUS is nearly optimal in the order of $T$ . In addition, we improve LOTUS so that it does not require knowledge of the rank $r$ with $\tilde O(dr^\frac{3}{2}T^\frac{1+\delta}{1+2\delta})$ regret bound, and it is efficient under the high-dimensional scenario. We also conduct simulations to demonstrate the practical superiority of our algorithm.

Subject: UAI.2024 - Accept

#7 Trusted re-weighting for label distribution learning [PDF²] [Copy] [Kimi¹] [REL]

Authors: Zhuoran Zheng, Chen Wu, YEYING JIN, Xiuyi Jia

Label distribution learning (LDL) is a novel machine learning paradigm that aims to shift 0/1 labels into descriptive degrees to characterize the polysemy of instances. Since the description degree takes a value between 0 ∼ 1, it is difficult for the annotator to accurately annotate each label. Therefore, the predictive ability of numerous LDL algorithms may be degraded by the presence of noise in the label space. To address this problem, we propose a novel stability-trust LDL framework that aims to reconstruct the feature space of an arbitrary LDL dataset by using feature decoupling and prototype guidance. Specifically, first, we use prototype learning to select reliable cluster centers (representative vectors of label distributions) to filter out a set of clean samples (with labeled noise) on the original dataset. Then, we decouple the feature space (eliminating correlations among features) by modeling a weight assigner that is learned on this clean sample set, thus assigning weights to each sample of the original dataset. Finally, all existing LDL algorithms can be trained on this new re-weighted dataset for the goal of robust modeling. In addition, we create a new image dataset to support the training and testing of compared models. Experimental results demonstrate that the proposed framework boosts the performance of the LDL algorithm on datasets with label noise.

Subject: UAI.2024 - Accept

#8 Partial Identification with Proxy of Latent Confoundings via Sum-of-ratios Fractional Programming [PDF] [Copy] [Kimi] [REL]

Authors: Zhiheng Zhang, Xinyan Su

Causal effect estimation is a crucial theoretical tool in uncertainty analysis. The challenge of unobservable confoundings has raised concerns regarding quantitative causality computation. To address this issue, proxy control has become popular, employing auxiliary variables W as proxies for the confounding variables U. However, proximal methods rely on strong assumptions, such as reversibility and completeness, that are challenging to interpret empirically and verify. Consequently, their applicability in real-world scenarios is limited, particularly when the proxies lack informativeness. In our paper, we have developed a novel optimization method named Partial Identification with Proxy of Latent Confoundings via Sum-of-Ratios Fractional Programming (PI-SFP). This method does not impose any additional restrictions upon proxies and only assumes the mild partial observability of the transition matrix P(W | U). We have theoretically proven the global convergence of PI-SFP to the valid bound of the causal effect and analyzed the conditions under which the bounds could be tight. Our synthetic and real-world experiments validate our theoretical framework.

Subject: UAI.2024 - Accept

#9 SMuCo: Reinforcement Learning for Visual Control via Sequential Multi-view Total Correlation [PDF] [Copy] [Kimi] [REL]

Authors: Tong Cheng, Hang Dong, Lu Wang, Bo Qiao, Qingwei Lin, Saravan Rajmohan, Thomas Moscibroda

The advent of abundant image data has catalyzed the advancement of visual control in reinforcement learning (RL) systems, leveraging multiple view- points to capture the same physical states, which could enhance control performance theoretically. However, integrating multi-view data into representation learning remains challenging. In this paper, we introduce SMuCo, an innovative multi-view reinforcement learning algorithm that constructs robust latent representations by optimizing multi- view sequential total correlation. This technique effectively captures task-relevant information and temporal dynamics while filtering out irrelevant data. Our method supports an unlimited number of views and demonstrates superior performance over leading model-free and model-based RL algorithms. Empirical results from the DeepMind Control Suite and the Sapien Basic Manipulation Task confirm SMuCo’s enhanced efficacy, significantly improving task performance across diverse scenarios and views.

Subject: UAI.2024 - Accept

#10 Fast Reliability Estimation for Neural Networks with Adversarial Attack-Driven Importance Sampling [PDF] [Copy] [Kimi] [REL]

Authors: Karim TIT, Teddy Furon

This paper introduces a novel approach to evaluate the reliability of Neural Networks (NNs) by integrating adversarial attacks with Importance Sampling (IS), enhancing the assessment's precision and efficiency. Leveraging adversarial attacks to guide IS, our method efficiently identifies vulnerable input regions, offering a more directed alternative to traditional Monte Carlo methods. While comparing our approach with classical reliability techniques like FORM and SORM, and with classical rare event simulation methods such as Cross-Entropy IS, we acknowledge its reliance on the effectiveness of adversarial attacks and its inability to handle very high-dimensional data such as ImageNet. Despite these challenges, our comprehensive empirical validations on the datasets the MNIST and CIFAR10 demonstrate the method's capability to accurately estimate NN reliability for a variety of models. Our research not only presents an innovative strategy for reliability assessment in NNs but also sets the stage for further work exploiting the connection between adversarial robustness and the field of statistical reliability engineering.

Subject: UAI.2024 - Accept

#11 Learning from Crowds with Dual-View K-Nearest Neighbor [PDF] [Copy] [Kimi] [REL]

Authors: Jiao Li, Liangxiao Jiang, Xue Wu, Wenjun Zhang

In crowdsourcing scenarios, we can obtain multiple noisy labels from different crowd workers for each instance and then infer its integrated label via label integration. To achieve better performance, some recently published label integration methods have attempted to exploit the multiple noisy labels of inferred instances’ nearest neighbors via the K-nearest neighbor (KNN) algorithm. However, the used KNN algorithm searches inferred instances’ nearest neighbors only relying on the defined distance functions in the original attribute view and totally ignoring the valuable information hidden in the multiple noisy labels, which limits their performance. Motivated by multi-view learning, we define the multiple noisy labels as another label view of instances and propose to search inferred instances’ nearest neighbors using the joint information from both the original attribute view and the multiple noisy label view. To this end, we propose a novel label integration method called dual-view K-nearest neighbor (DVKNN). In DVKNN, we first define a new distance function to search the K-nearest neighbors of an inferred instance. Then, we define a fine-grained weight for each noisy label from each neighbor. Finally, we perform weighted majority voting (WMV) on all these noisy labels to obtain the integrated label of the inferred instance. Extensive experiments validate the effectiveness and rationality of DVKNN.

Subject: UAI.2024 - Accept

#12 BanditQ:Fair Bandits with Guaranteed Rewards [PDF] [Copy] [Kimi] [REL]

Author: Abhishek Sinha

Classic no-regret multi-armed bandit algorithms, including the Upper Confidence Bound (UCB), Hedge, and EXP3, are inherently unfair by design. Their unfairness stems from their objective of playing the most rewarding arm as frequently as possible while ignoring the rest. In this paper, we consider a fair prediction problem in the stochastic setting with a guaranteed minimum rate of accrual of rewards for each arm. We study the problem in both full-information and bandit feedback settings. Combining queueing-theoretic techniques with adversarial bandits, we propose a new online policy called BanditQ that achieves the target reward rates while conceding a regret and target rate violation penalty of at most $O(T^{\frac{3}{4}}).$ The regret bound in the full-information setting can be further improved to $O(\sqrt{T})$ under either a monotonicity assumption or when considering time-averaged regret. The proposed policy is efficient and admits a black-box reduction from the fair prediction problem to the standard adversarial MAB problem. The analysis of the BanditQ policy involves a new self-bounding inequality, which might be of independent interest.

Subject: UAI.2024 - Accept

#13 Neighbor Similarity and Multimodal Alignment based Product Recommendation Study [PDF] [Copy] [Kimi] [REL]

Authors: Zhiqiang Zhang, Yongqiang Jiang, Qian Gao, Zhipeng Wang

Existing multimodal recommendation research still faces some challenges, such as not being able to fully mine the implicit relevance information of neighbor nodes, and the unreasonable weight allocation to imbalanced nodes. To address the aforementioned challenges, this paper introduces a new multimodal recommendation model called NSMAR+. Specifically, the model firstly constructs a neighbor similarity graph convolutional network to capture the implicit relevance information and reasonably assigns the attention weights through the graph attention mechanism. Secondly, the model introduces a modal alignment and fusion mechanism by using a multilayer perceptron (MLP) to map image and text features into a shared space for comparison and fusion. In addition, the model constructs a user co-interaction graph and an item semantic graph based on the original information and performs graph convolution operations to enhance the preference information of users and items, and to better capture the interactions and internal features between users and items. Finally, MLP is employed to aggregate user and item representations and predict personalized recommendation rankings. To validate the experiment’s efficacy, this paper compares with several leading multimodal recommendation models on public datasets with a performance improvement of 1% to 3%. The experimental outcomes indicate that the model in this paper has good superiority and accuracy in multimodal recommendation tasks.

Subject: UAI.2024 - Accept

#14 Differentiable Pareto-Smoothed Weighting for High-Dimensional Heterogeneous Treatment Effect Estimation [PDF] [Copy] [Kimi] [REL]

Authors: Yoichi Chikahara, Kansei Ushiyama

There is a growing interest in estimating heterogeneous treatment effects across individuals using their high-dimensional feature attributes. Achieving high performance in such high-dimensional heterogeneous treatment effect estimation is challenging because in this setup, it is usual that some features induce sample selection bias while others do not but are predictive of potential outcomes. To avoid losing such predictive feature information, existing methods learn separate feature representations using inverse probability weighting (IPW). However, due to their numerically unstable IPW weights, these methods suffer from estimation bias under a finite sample setup. To develop a numerically robust estimator by weighted representation learning, we propose a differentiable Pareto-smoothed weighting framework that replaces extreme weight values in an end-to-end fashion. Our experimental results show that by effectively correcting the weight values, our proposed method outperforms the existing ones, including traditional weighting schemes. Our code is available at [this https URL](https://github.com/ychika/DPSW).

Subject: UAI.2024 - Accept

#15 Calibrated and Conformal Propensity Scores for Causal Effect Estimation [PDF] [Copy] [Kimi] [REL]

Authors: Shachi Deshpande, Volodymyr Kuleshov

Propensity scores are commonly used to balance observed covariates while estimating treatment effects. We argue that the probabilistic output of a learned propensity score model should be calibrated, i.e. a predictive treatment probability of 90% should correspond to 90% of individuals being assigned the treatment group. We propose simple recalibration techniques to ensure this property. We prove that calibration is a necessary condition for unbiased treatment effect estimation when using popular inverse propensity weighted and doubly robust estimators. We derive error bounds on causal effect estimates that directly relate to the quality of uncertainties provided by the probabilistic propensity score model and show that calibration strictly improves this error bound while also avoiding extreme propensity weights. We demonstrate improved causal effect estimation with calibrated propensity scores in several tasks including high-dimensional image covariates and genome-wide association studies (GWASs). Calibrated propensity scores improve the speed of GWAS analysis by more than two-fold by enabling the use of simpler models that are faster to train.

Subject: UAI.2024 - Accept

#16 Hybrid CtrlFormer: Learning Adaptive Search Space Partition for Hybrid Action Control via Transformer-based Monte Carlo Tree Search [PDF] [Copy] [Kimi] [REL]

Authors: Jiashun Liu, Xiaotian Hao, Jianye HAO, YAN ZHENG, Yujing Hu, Changjie Fan, Tangjie Lv, Zhipeng Hu

Hybrid action control tasks are common in the real world, which require controlling some discrete and continuous actions simultaneously. To solve these tasks, existing Deep Reinforcement learning (DRL) methods either directly build a separate policy for each type of action or simplify the hybrid action space into a discrete or continuous action control problem. However, these methods neglect the challenge of exploration resulting from the complexity of the hybrid action space. Thus, it is necessary to design more sample efficient algorithms. To this end, we propose a novel Hybrid Control Transformer (Hybrid CtrlFormer), to achieve better exploration and exploitation for the hybrid action control problems. The core idea is: 1) we construct a hybrid action space tree with the discrete actions at the higher level and the continuous parameter space at the lower level. Each parameter space is split into multiple subregions. 2) To simplify the exploration space, a Transformer-based Monte-Carlo tree search method is designed to efficiently evaluate and partition the hybrid action space into good and bad subregions along the tree. Our method achieves state-of-the-art performance and sample efficiency in a variety of environments with discrete-continuous action space.

Subject: UAI.2024 - Accept

#17 Online Policy Optimization for Robust Markov Decision Process [PDF] [Copy] [Kimi] [REL]

Authors: Jing Dong, Jingwei Li, Baoxiang Wang, Jingzhao Zhang

Reinforcement learning (RL) has exceeded human performance in many synthetic settings such as video games and Go. However, real-world deployment of end-to-end RL models is less common, as RL models can be very sensitive to perturbations in the environment. The robust Markov decision process (MDP) framework---in which the transition probabilities belong to an uncertainty set around a nominal model---provides one way to develop robust models. While previous analysis for robust MDP shows RL algorithms are effective assuming access to a generative model, it remains unclear whether RL can be efficient under a more realistic online setting, which requires a careful balance between exploration and exploitation. In this work, we consider online robust MDP by interacting with an unknown nominal system. We propose a robust optimistic policy optimization algorithm that is provably efficient. To address the additional uncertainty caused by an adversarial environment, our model features a new optimistic update rule derived via Fenchel conjugates. Our analysis establishes the first regret bound for online robust MDPs.

Subject: UAI.2024 - Accept

#18 ContextFlow++: Generalist-Specialist Flow-based Generative Models with Mixed-variable Context Encoding [PDF¹] [Copy] [Kimi¹] [REL]

Authors: Denis A Gudovskiy, Tomoyuki Okuno, Yohei Nakata

Normalizing flow-based generative models have been widely used in applications where the exact density estimation is of major importance. Recent research proposes numerous methods to improve their expressivity. However, conditioning on a context is largely overlooked area in the bijective flow research. Conventional conditioning with the vector concatenation is limited to only a few flow types. More importantly, this approach cannot support a practical setup where a set of context-conditioned (*specialist*) models are trained with the fixed pretrained general-knowledge (*generalist*) model. We propose ContextFlow++ approach to overcome these limitations using an additive conditioning with explicit generalist-specialist knowledge decoupling. Furthermore, we support discrete contexts by the proposed mixed-variable architecture with context encoders. Particularly, our context encoder for discrete variables is a surjective flow from which the context-conditioned continuous variables are sampled. Our experiments on rotated MNIST-R, corrupted CIFAR-10C, real-world ATM predictive maintenance and SMAP unsupervised anomaly detection benchmarks show that the proposed ContextFlow++ offers faster stable training and achieves higher performance metrics. Our code is publicly available at [github.com/gudovskiy/contextflow](https://github.com/gudovskiy/contextflow).

Subject: UAI.2024 - Accept

#19 Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States [PDF] [Copy] [Kimi] [REL]

Authors: Ziqiao Wang, Yongyi Mao

Stochastic differential equations (SDEs) have been shown recently to characterize well the dynamics of training machine learning models with SGD. When the generalization error of the SDE approximation closely aligns with that of SGD in expectation, it provides two opportunities for understanding better the generalization behaviour of SGD through its SDE approximation. Firstly, viewing SGD as full-batch gradient descent with Gaussian gradient noise allows us to obtain trajectory-based generalization bound using the information-theoretic bound from Xu and Raginsky [2017]. Secondly, assuming mild conditions, we estimate the steady-state weight distribution of SDE and use information-theoretic bounds from Xu and Raginsky [2017] and Negrea et al. [2019] to establish terminal-state-based generalization bounds. Our proposed bounds have some advantages, notably the trajectory-based bound outperforms results in Wang and Mao [2022], and the terminal-state-based bound exhibits a fast decay rate comparable to stability-based bounds.

Subject: UAI.2024 - Accept

#20 Guaranteeing Robustness Against Real-World Perturbations In Time Series Classification Using Conformalized Randomized Smoothing [PDF] [Copy] [Kimi] [REL]

Authors: Nicola Franco, Jakob Spiegelberg, Jeanette Miriam Lorenz, Stephan Günnemann

Certifying the robustness of machine learning models against domain shifts and input space perturbations is crucial for many applications, where high risk decisions are based on the model's predictions. Techniques such as randomized smoothing have partially addressed this issues with a focus on adversarial attacks in the past. In this paper, we generalize randomized smoothing to arbitrary transformations and extend it to conformal prediction. The proposed ansatz is demonstrated on a time series classifier connected to an automotive use case. We meticulously assess the robustness of smooth classifiers in environments subjected to various degrees and types of time series native perturbations and compare it against standard conformal predictors. The proposed method consistently offers superior resistance to perturbations, maintaining high classification accuracy and reliability. Additionally, we are able to bound the performance on new domains via calibrating generalization with configuration shifts in the training data. In combination, conformalized randomized smoothing may offer a model agnostic approach to construct robust classifiers tailored to perturbations in their respective applications - a crucial capability for AI assurance argumentation.

Subject: UAI.2024 - Accept

#21 Can we Defend Against the Unknown? An Empirical Study About Threshold Selection for Neural Network Monitoring [PDF] [Copy] [Kimi] [REL]

Authors: Khoi Tran DANG, Kevin Delmas, Jeremie Guiochet, Joris Guerin

With the increasing use of neural networks in critical systems, runtime monitoring becomes essential to reject unsafe predictions during inference. Various techniques have emerged to establish rejection scores that maximize the separability between the distributions of safe and unsafe predictions. The efficacy of these approaches is mostly evaluated using threshold-agnostic metrics, such as the area under the receiver operating characteristic curve. However, in real-world applications, an effective monitor also requires identifying a good threshold to transform these scores into meaningful binary decisions. Despite the pivotal importance of threshold optimization, this problem has received little attention. A few studies touch upon this question, but they typically assume that the runtime data distribution mirrors the training distribution, which is a strong assumption as monitors are supposed to safeguard a system against potentially unforeseen threats. In this work, we present rigorous experiments on various image datasets to investigate: 1. The effectiveness of monitors in handling unforeseen threats, which are not available during threshold adjustments. 2. Whether integrating generic threats into the threshold optimization scheme can enhance the robustness of monitors.

Subject: UAI.2024 - Accept

#22 GeONet: a neural operator for learning the Wasserstein geodesic [PDF] [Copy] [Kimi] [REL]

Authors: Andrew Gracyk, Xiaohui Chen

Optimal transport (OT) offers a versatile framework to compare complex data distributions in a geometrically meaningful way. Traditional methods for computing the Wasserstein distance and geodesic between probability measures require mesh-specific domain discretization and suffer from the curse-of-dimensionality. We present GeONet, a mesh-invariant deep neural operator network that learns the non-linear mapping from the input pair of initial and terminal distributions to the Wasserstein geodesic connecting the two endpoint distributions. In the offline training stage, GeONet learns the saddle point optimality conditions for the dynamic formulation of the OT problem in the primal and dual spaces that are characterized by a coupled PDE system. The subsequent inference stage is instantaneous and can be deployed for real-time predictions in the online learning setting. We demonstrate that GeONet achieves comparable testing accuracy to the standard OT solvers on simulation examples and the MNIST dataset with considerably reduced inference-stage computational cost by orders of magnitude.

Subject: UAI.2024 - Accept

#23 Metric Learning from Limited Pairwise Preference Comparisons [PDF] [Copy] [Kimi] [REL]

Authors: Zhi Wang, Geelon So, Ramya Korlakai Vinayak

We study metric learning from preference comparisons under the ideal point model, in which a user prefers an item over another if it is closer to their latent ideal item. These items are embedded into $\mathbb{R}^d$ equipped with an unknown Mahalanobis distance shared across users. While recent work shows that it is possible to simultaneously recover the metric and ideal items given $\mathcal{O}(d)$ pairwise comparisons per user, in practice we often have a limited budget of $o(d)$ comparisons. We study whether the metric can still be recovered, even though learning individual ideal items is now no longer possible. We show that, on the one hand, $o(d)$ comparisons may not reveal any information about the metric, even with infinitely many users. On the other hand, when comparisons are made over items that exhibit low-dimensional structure, each user can contribute to learning the metric restricted to a low-dimensional subspace so that the metric can be jointly identified. We present a divide-and-conquer approach that achieves this, and provide theoretical recovery guarantees and empirical validation.

Subject: UAI.2024 - Accept

#24 Bayesian Pseudo-Coresets via Contrastive Divergence [PDF] [Copy] [Kimi] [REL]

Authors: Piyush Tiwary, Kumar Shubham, Vivek V Kashyap, Prathosh AP

Bayesian methods provide an elegant framework for estimating parameter posteriors and quantification of uncertainty associated with probabilistic models. However, they often suffer from slow inference times. To address this challenge, Bayesian Pseudo-Coresets (BPC) have emerged as a promising solution. BPC methods aim to create a small synthetic dataset, known as pseudo-coresets, that approximates the posterior inference achieved with the original dataset. This approximation is achieved by optimizing a divergence measure between the true posterior and the pseudo-coreset posterior. Various divergence measures have been proposed for constructing pseudo-coresets, with forward Kullback-Leibler (KL) divergence being the most successful. However, using forward KL divergence necessitates sampling from the pseudo-coreset posterior, often accomplished through approximate Gaussian variational distributions. Alternatively, one could employ Markov Chain Monte Carlo (MCMC) methods for sampling, but this becomes challenging in high-dimensional parameter spaces due to slow mixing. In this study, we introduce a novel approach for constructing pseudo-coresets by utilizing contrastive divergence. Importantly, optimizing contrastive divergence eliminates the need for approximations in the pseudo-coreset construction process. Furthermore, it enables the use of finite-step MCMC methods, alleviating the requirement for extensive mixing to reach a stationary distribution. To validate our method's effectiveness, we conduct extensive experiments on multiple datasets, demonstrating its superiority over existing BPC techniques. Our implementation is available at https://github.com/backpropagator/BPC-CD .

Subject: UAI.2024 - Accept

#25 How Inverse Conditional Flows Can Serve as a Substitute for Distributional Regression [PDF] [Copy] [Kimi] [REL]

Authors: Lucas Kook, Chris Kolb, Philipp Schiele, Daniel Dold, Marcel Arpogaus, Cornelius Fritz, Philipp F. M. Baumann, Philipp Kopper, Tobias Pielok, Emilio Dorigatti, David Rügamer

Neural network representations of simple models, such as linear regression, are being studied increasingly to better understand the underlying principles of deep learning algorithms. However, neural representations of distributional regression models, such as the Cox model, have received little attention so far. We close this gap by proposing a framework for distributional regression using inverse flow transformations (DRIFT), which includes neural representations of the aforementioned models. We empirically demonstrate that the neural representations of models in DRIFT can serve as a substitute for their classical statistical counterparts in several applications involving continuous, ordered, time-series, and survival outcomes. We confirm that models in DRIFT empirically match the performance of several statistical methods in terms of estimation of partial effects, prediction, and aleatoric uncertainty quantification. DRIFT covers both interpretable statistical models and flexible neural networks opening up new avenues in both statistical modeling and deep learning.

Subject: UAI.2024 - Accept