| Total: 10

Datasets involving sequences of different types of events without meaningful time stamps are prevalent in many applications, for instance when extracted from textual corpora. We propose a family of models for such event sequences -- summary Markov models -- where the probability of observing an event type depends only on a summary of historical occurrences of its influencing set of event types. This Markov model family is motivated by Granger causal models for time series, with the important distinction that only one event can occur in a position in an event sequence. We show that a unique minimal influencing set exists for any set of event types of interest and choice of summary function, formulate two novel models from the general family that represent specific sequence dynamics, and propose a greedy search algorithm for learning them from event sequence data. We conduct an experimental investigation comparing the proposed models with relevant baselines, and illustrate their knowledge acquisition and discovery capabilities through case studies involving sequences from text.

Unobserved confounding is the main obstacle to causal effect estimation from observational data. Instrumental variables (IVs) are widely used for causal effect estimation when there exist latent confounders. With the standard IV method, when a given IV is valid, unbiased estimation can be obtained, but the validity requirement on a standard IV is strict and untestable. Conditional IVs have been proposed to relax the requirement of standard IVs by conditioning on a set of observed variables (known as a conditioning set for a conditional IV). However, the criterion for finding a conditioning set for a conditional IV needs a directed acyclic graph (DAG) representing the causal relationships of both observed and unobserved variables. This makes it challenging to discover a conditioning set directly from data. In this paper, by leveraging maximal ancestral graphs (MAGs) for causal inference with latent variables, we study the graphical properties of ancestral IVs, a type of conditional IVs using MAGs, and develop the theory to support data-driven discovery of the conditioning set for a given ancestral IV in data under the pretreatment variable assumption. Based on the theory, we develop an algorithm for unbiased causal effect estimation with a given ancestral IV and observational data. Extensive experiments on synthetic and real-world datasets demonstrate the performance of the algorithm in comparison with existing IV methods.

Causal discovery is to learn cause-effect relationships among variables given observational data and is important for many applications. Existing causal discovery methods assume data sufficiency, which may not be the case in many real world datasets. As a result, many existing causal discovery methods can fail under limited data. In this work, we propose Bayesian-augmented frequentist independence tests to improve the performance of constraint-based causal discovery methods under insufficient data: 1) We firstly introduce a Bayesian method to estimate mutual information (MI), based on which we propose a robust MI based independence test; 2) Secondly, we consider the Bayesian estimation of hypothesis likelihood and incorporate it into a well-defined statistical test, resulting in a robust statistical testing based independence test. We apply proposed independence tests to constraint-based causal discovery methods and evaluate the performance on benchmark datasets with insufficient samples. Experiments show significant performance improvement in terms of both accuracy and efficiency over SOTA methods.

We introduce hidden 1-counter Markov models (H1MMs) as an attractive sweet spot between standard hidden Markov models (HMMs) and probabilistic context-free grammars (PCFGs). Both HMMs and PCFGs have a variety of applications, e.g., speech recognition, anomaly detection, and bioinformatics. PCFGs are more expressive than HMMs, e.g., they are more suited for studying protein folding or natural language processing. However, they suffer from slow parameter fitting, which is cubic in the observation sequence length. The same process for HMMs is just linear using the well-known forward-backward algorithm. We argue that by adding to each state of an HMM an integer counter, e.g., representing the number of clients waiting in a queue, brings its expressivity closer to PCFGs. At the same time, we show that parameter fitting for such a model is computationally inexpensive: it is bi-linear in the length of the observation sequence and the maximal counter value, which grows slower than the observation length. The resulting model of H1MMs allows us to combine the best of both worlds: more expressivity with faster parameter fitting.

Sum-Product Networks (SPNs) are expressive probabilistic models that provide exact, tractable inference. They achieve this efficiency by making use of local independence. On the other hand, mixtures of exchangeable variable models (MEVMs) are a class of tractable probabilistic models that make use of exchangeability of discrete random variables to render inference tractable. Exchangeability, which arises naturally in relational domains, has not been considered for efficient representation and inference in SPNs yet. The contribution of this paper is a novel probabilistic model which we call Exchangeability-Aware Sum-Product Networks (XSPNs). It contains both SPNs and MEVMs as special cases, and combines the ability of SPNs to efficiently learn deep probabilistic models with the ability of MEVMs to efficiently handle exchangeable random variables. We introduce a structure learning algorithm for XSPNs and empirically show that they can be more accurate than conventional SPNs when the data contains repeated, interchangeable parts.

Many real-world phenomena arise from causal relationships among a set of variables. As a powerful tool, Bayesian Network (BN) has been successful in describing high-dimensional distributions. However, the faithfulness condition, enforced in most BN learning algorithms, is violated in the settings where multiple variables synergistically affect the outcome (i.e., with polyadic dependencies). Building upon recent development in cluster causal diagrams (C-DAGs), we initiate the formal study of learning C-DAGs from observational data to relax the faithfulness condition. We propose a new scoring function, the Clustering Information Criterion (CIC), based on information-theoretic measures that represent various complex interactions among variables. The CIC score also contains a penalization of the model complexity under the minimum description length principle. We further provide a searching strategy to learn structures of high scores. Experiments on both synthetic and real data support the effectiveness of the proposed method.

In a sequential decision-making problem, having a structural dependency amongst the reward distributions associated with the arms makes it challenging to identify a subset of alternatives that guarantees the optimal collective outcome. Thus, besides individual actions' reward, learning the causal relations is essential to improve the decision-making strategy. To solve the two-fold learning problem described above, we develop the 'combinatorial semi-bandit framework with causally related rewards', where we model the causal relations by a directed graph in a stationary structural equation model. The nodal observation in the graph signal comprises the corresponding base arm's instantaneous reward and an additional term resulting from the causal influences of other base arms' rewards. The objective is to maximize the long-term average payoff, which is a linear function of the base arms' rewards and depends strongly on the network topology. To achieve this objective, we propose a policy that determines the causal relations by learning the network's topology and simultaneously exploits this knowledge to optimize the decision-making process. We establish a sublinear regret bound for the proposed algorithm. Numerical experiments using synthetic and real-world datasets demonstrate the superior performance of our proposed method compared to several benchmarks.

In many domains, worst-case guarantees on the performance (e.g. prediction accuracy) of a decision function subject to distributional shifts and uncertainty about the environment are crucial. In this work we develop a method to quantify the robustness of decision functions with respect to credal Bayesian networks, formal parametric models of the environment where uncertainty is expressed through credal sets on the parameters. In particular, we address the maximum marginal probability (MARmax) problem, that is, determining the greatest probability of an event (such as misclassification) obtainable for parameters in the credal set. We develop a method to faithfully transfer the problem into a constrained optimization problem on a probabilistic circuit. By performing a simple constraint relaxation, we show how to obtain a guaranteed upper bound on MARmax in linear time in the size of the circuit. We further theoretically characterize this constraint relaxation in terms of the original Bayesian network structure, which yields insight into the tightness of the bound. We implement the method and provide experimental evidence that the upper bound is often near tight and demonstrates improved scalability compared to other methods.

In many applications with real-world consequences, it is crucial to develop reliable uncertainty estimation for the predictions made by the AI decision systems. Targeting at the goal of estimating uncertainty, various deep neural network (DNN) based uncertainty estimation algorithms have been proposed. However, the robustness of the uncertainty returned by these algorithms has not been systematically explored. In this work, to raise the awareness of the research community on robust uncertainty estimation, we show that state-of-the-art uncertainty estimation algorithms could fail catastrophically under our proposed adversarial attack despite their impressive performance on uncertainty estimation. In particular, we aim at attacking out-domain uncertainty estimation: under our attack, the uncertainty model would be fooled to make high-confident predictions for the out-domain data, which they originally would have rejected. Extensive experimental results on various benchmark image datasets show that the uncertainty estimated by state-of-the-art methods could be easily corrupted by our attack.

The recently developed Particle-based Variational Inference (ParVI) methods drive the empirical distribution of a set of fixed-weight particles towards a given target distribution by iteratively updating particles' positions. However, the fixed weight restriction greatly confines the empirical distribution's approximation ability, especially when the particle number is limited. In this paper, we propose to dynamically adjust particles' weights according to a Fisher-Rao reaction flow. We develop a general Dynamic-weight Particle-based Variational Inference (DPVI) framework according to a novel continuous composite flow, which evolves the positions and weights of particles simultaneously. We show that the mean-field limit of our composite flow is actually a Wasserstein-Fisher-Rao gradient flow of the associated dissimilarity functional. By using different finite-particle approximations in our general framework, we derive several efficient DPVI algorithms. The empirical results demonstrate the superiority of our derived DPVI algorithms over their fixed-weight counterparts.