| Total: 38

We introduce a method to learn imitative policies from expert demonstrations that are interpretable and manipulable. We achieve interpretability by modeling the interactions between high-level actions as an automaton with connections to formal logic. We achieve manipulability by integrating this automaton into planning, so that changes to the automaton have predictable effects on the learned behavior. These qualities allow a human user to first understand what the model has learned, and then either correct the learned behavior or zero-shot generalize to new, similar tasks. We build upon previous work by no longer requiring additional supervised information which is hard to collect in practice. We achieve this by using a deep Bayesian nonparametric hierarchical model. We test our model on several domains and also show results for a real-world implementation on a mobile robotic arm platform.

We study the novel problem of blackbox optimization of multiple objectives via multi-fidelity function evaluations that vary in the amount of resources consumed and their accuracy. The overall goal is to appromixate the true Pareto set of solutions by minimizing the resources consumed for function evaluations. For example, in power system design optimization, we need to find designs that trade-off cost, size, efficiency, and thermal tolerance using multi-fidelity simulators for design evaluations. In this paper, we propose a novel approach referred as Multi-Fidelity Output Space Entropy Search for Multi-objective Optimization (MF-OSEMO) to solve this problem. The key idea is to select the sequence of candidate input and fidelity-vector pairs that maximize the information gained about the true Pareto front per unit resource cost. Our experiments on several synthetic and real-world benchmark problems show that MF-OSEMO, with both approximations, significantly improves over the state-of-the-art single-fidelity algorithms for multi-objective optimization. Please note: A corrigendum was submitted for this paper on 24 September 2020.

We consider the problem of multi-objective (MO) blackbox optimization using expensive function evaluations, where the goal is to approximate the true Pareto set of solutions while minimizing the number of function evaluations. For example, in hardware design optimization, we need to find the designs that trade-off performance, energy, and area overhead using expensive simulations. We propose a novel uncertainty-aware search framework referred to as USeMO to efficiently select the sequence of inputs for evaluation to solve this problem. The selection method of USeMO consists of solving a cheap MO optimization problem via surrogate models of the true functions to identify the most promising candidates and picking the best candidate based on a measure of uncertainty. We also provide theoretical analysis to characterize the efficacy of our approach. Our experiments on several synthetic and six diverse real-world benchmark problems show that USeMO consistently outperforms the state-of-the-art algorithms.

In this work, we develop a new approach to generative density estimation for exchangeable, non-i.i.d. data. The proposed framework, FlowScan, combines invertible flow transformations with a sorted scan to flexibly model the data while preserving exchangeability. Unlike most existing methods, FlowScan exploits the intradependencies within sets to learn both global and local structure. FlowScan represents the first approach that is able to apply sequential methods to exchangeable density estimation without resorting to averaging over all possible permutations. We achieve new state-of-the-art performance on point cloud and image set modeling.

Autonomous systems are often required to operate in partially observable environments. They must reliably execute a specified objective even with incomplete information about the state of the environment. We propose a methodology to synthesize policies that satisfy a linear temporal logic formula in a partially observable Markov decision process (POMDP). By formulating a planning problem, we show how to use point-based value iteration methods to efficiently approximate the maximum probability of satisfying a desired logical formula and compute the associated belief state policy. We demonstrate that our method scales to large POMDP domains and provides strong bounds on the performance of the resulting policy.

We present new algorithms for computing and approximating bisimulation metrics in Markov Decision Processes (MDPs). Bisimulation metrics are an elegant formalism that capture behavioral equivalence between states and provide strong theoretical guarantees on differences in optimal behaviour. Unfortunately, their computation is expensive and requires a tabular representation of the states, which has thus far rendered them impractical for large problems. In this paper we present a new version of the metric that is tied to a behavior policy in an MDP, along with an analysis of its theoretical properties. We then present two new algorithms for approximating bisimulation metrics in large, deterministic MDPs. The first does so via sampling and is guaranteed to converge to the true metric. The second is a differentiable loss which allows us to learn an approximation even for continuous state MDPs, which prior to this work had not been possible.

As machine learning is increasingly used to make real-world decisions, recent research efforts aim to define and ensure fairness in algorithmic decision making. Existing methods often assume a fixed set of observable features to define individuals, but lack a discussion of certain features not being observed at test time. In this paper, we study fairness of naive Bayes classifiers, which allow partial observations. In particular, we introduce the notion of a discrimination pattern, which refers to an individual receiving different classifications depending on whether some sensitive attributes were observed. Then a model is considered fair if it has no such pattern. We propose an algorithm to discover and mine for discrimination patterns in a naive Bayes classifier, and show how to learn maximum-likelihood parameters subject to these fairness constraints. Our approach iteratively discovers and eliminates discrimination patterns until a fair model is learned. An empirical evaluation on three real-world datasets demonstrates that we can remove exponentially many discrimination patterns by only adding a small fraction of them as constraints.

Regret minimisation in stochastic multi-armed bandits is a well-studied problem, for which several optimal algorithms have been proposed. Such algorithms depend on (sufficient statistics of) the empirical reward distributions of the arms to decide which arm to pull next. In this paper, we consider the design of algorithms that are constrained to store statistics from only a bounded number of arms. For bandits with a finite set of arms, we derive a sub-linear upper bound on the regret that decreases with the “arm memory” size M. For instances with a large, possibly infinite, set of arms, we show a sub-linear bound on the quantile regret. Our problem formulation generalises that of Liau et al. (2018), who fix M = O(1), and so do not obtain bounds that depend on M. More importantly, our algorithms keep exploration and exploitation tightly coupled, without a dedicated exploration phase as employed by Liau et al. (2018). Although this choice makes our analysis harder, it leads to much-improved practical performance. For bandits with a large number of arms and no known structure on the rewards, our algorithms serve as a viable option. Unlike many other approaches to restrict the memory of bandit algorithms, our algorithms do not need any additional technical assumptions.

Some of the most prominent results in causal inference have been developed in the context of atomic interventions, following the semantics of the do-operator and the inferential power of the do-calculus. In practice, many real-world settings require more complex types of interventions that cannot be represented by a simple atomic intervention. In this paper, we investigate a general class of interventions that covers some non-trivial types of policies (conditional and stochastic), which goes beyond the atomic class. Our goal is to develop general understanding and formal machinery to be able to reason about the effects of those policies, similar to the robust treatment developed to handle the atomic case. Specifically, in this paper, we introduce a new set of inference rules (akin to do-calculus) that can be used to derive claims about general interventions, which we call σ-calculus. We develop a systematic and efficient procedure for finding estimands of the effect of general policies as a function of the available observational and experimental distributions. We then prove that our algorithm and σ-calculus are both sound for the tasks of identification (Pearl, 1995) and z-identification (Bareinboim and Pearl, 2012) under this class of interventions.

Skeleton Learning (SL) is the task for learning an undirected graph from the input data that captures their dependency relations. SL plays a pivotal role in causal learning and has attracted growing attention in the research community lately. Due to the high time complexity, anytime SL has emerged which learns a skeleton incrementally and improves it overtime. In this paper, we first propose and advocate the reliability requirement for anytime SL to be practically useful. Reliability requires the intermediately learned skeleton to have precision and persistency. We also present REAL, a novel Reliable and Efficient Anytime Learning algorithm of skeleton. Specifically, we point out that the commonly existing Functional Dependency (FD) among variables could make the learned skeleton violate faithfulness assumption, thus we propose a theory to resolve such incompatibility. Based on this, REAL conducts SL on a reduced set of variables with guaranteed correctness thus drastically improves efficiency. Furthermore, it employs a novel edge-insertion and best-first strategy in anytime fashion for skeleton growing to achieve high reliability and efficiency. We prove that the skeleton learned by REAL converges to the correct skeleton under standard assumptions. Thorough experiments were conducted on both benchmark and real-world datasets demonstrate that REAL significantly outperforms the other state-of-the-art algorithms.

Deception is a fundamental issue across a diverse array of settings, from cybersecurity, where decoys (e.g., honeypots) are an important tool, to politics that can feature politically motivated “leaks” and fake news about candidates. Typical considerations of deception view it as providing false information. However, just as important but less frequently studied is a more tacit form where information is strategically hidden or leaked. We consider the problem of how much an adversary can affect a principal's decision by “half-truths”, that is, by masking or hiding bits of information, when the principal is oblivious to the presence of the adversary. The principal's problem can be modeled as one of predicting future states of variables in a dynamic Bayes network, and we show that, while theoretically the principal's decisions can be made arbitrarily bad, the optimal attack is NP-hard to approximate, even under strong assumptions favoring the attacker. However, we also describe an important special case where the dependency of future states on past states is additive, in which we can efficiently compute an approximately optimal attack. Moreover, in networks with a linear transition function we can solve the problem optimally in polynomial time.

Learning from demonstrations (LfD) is an efficient paradigm to train AI agents. But major issues arise when there are differences between (a) the demonstrator's own sensory input, (b) our sensors that observe the demonstrator and (c) the sensory input of the agent we train. In this paper, we propose a causal model-based framework for transfer learning under such “sensor-shifts”, for two common LfD tasks: (1) inferring the effect of the demonstrator's actions and (2) imitation learning. First we rigorously analyze, on the population-level, to what extent the relevant underlying mechanisms (the action effects and the demonstrator policy) can be identified and transferred from the available observations together with prior knowledge of sensor characteristics. And we device an algorithm to infer these mechanisms. Then we introduce several proxy methods which are easier to calculate, estimate from finite data and interpret than the exact solutions, alongside theoretical bounds on their closeness to the exact ones. We validate our two main methods on simulated and semi-real world data.

Learning models with discrete latent variables using stochastic gradient descent remains a challenge due to the high variance of gradient estimates. Modern variance reduction techniques mostly consider categorical distributions and have limited applicability when the number of possible outcomes becomes large. In this work, we consider models with latent permutations and propose control variates for the Plackett-Luce distribution. In particular, the control variates allow us to optimize black-box functions over permutations using stochastic gradient descent. To illustrate the approach, we consider a variety of causal structure learning tasks for continuous and discrete data. We show that our method outperforms competitive relaxation-based optimization methods and is also applicable to non-differentiable score functions.

We consider the problem of counting the number of DAGs which are Markov-equivalent, i.e., which encode the same conditional independencies between random variables. The problem has been studied, among others, in the context of causal discovery, and it is known that it reduces to counting the number of so-called moral acyclic orientations of certain undirected graphs, notably chordal graphs. Our main empirical contribution is a new algorithm which outperforms previously known exact algorithms for the considered problem by a significant margin. On the theoretical side, we show that our algorithm is guaranteed to run in polynomial time on a broad class of chordal graphs, including interval graphs.

The success of MaxSAT (maximum satisfiability) solving in recent years has motivated researchers to apply MaxSAT solvers in diverse discrete combinatorial optimization problems. Group testing has been studied as a combinatorial optimization problem, where the goal is to find defective items among a set of items by performing sets of tests on items. In this paper, we propose a MaxSAT-based framework, called MGT, that solves group testing, in particular, the decoding phase of non-adaptive group testing. We extend this approach to the noisy variant of group testing, and propose a compact MaxSAT-based encoding that guarantees an optimal solution. Our extensive experimental results show that MGT can solve group testing instances of 10000 items with 3% defectivity, which no prior work can handle to the best of our knowledge. Furthermore, MGT has better accuracy than the LP-based approach. We also discover an interesting phase transition behavior in the runtime, which reveals the easy-hard-easy nature of group testing.

A number of approaches to causal discovery assume that there are no hidden confounders and are designed to learn a fixed causal model from a single data set. Over the last decade, with closer cooperation across laboratories, we are able to accumulate more variables and data for analysis, while each lab may only measure a subset of them, due to technical constraints or to save time and cost. This raises a question of how to handle causal discovery from multiple data sets with non-identical variable sets, and at the same time, it would be interesting to see how more recorded variables can help to mitigate the confounding problem. In this paper, we propose a principled method to uniquely identify causal relationships over the integrated set of variables from multiple data sets, in linear, non-Gaussian cases. The proposed method also allows distribution shifts across data sets. Theoretically, we show that the causal structure over the integrated set of variables is identifiable under testable conditions. Furthermore, we present two types of approaches to parameter estimation: one is based on maximum likelihood, and the other is likelihood free and leverages generative adversarial nets to improve scalability of the estimation procedure. Experimental results on various synthetic and real-world data sets are presented to demonstrate the efficacy of our methods.

Representations of sequential data are commonly based on the assumption that observed sequences are realizations of an unknown underlying stochastic process, where the learning problem includes determination of the model parameters. In this context, a model must be able to capture the multi-modal nature of the data, without blurring between single modes. This paper proposes probabilistic B'{e}zier curves (

We propose a formalization of the three-tier causal hierarchy of association, intervention, and counterfactuals as a series of probabilistic logical languages. Our languages are of strictly increasing expressivity, the first capable of expressing quantitative probabilistic reasoning—including conditional independence and Bayesian inference—the second encoding do-calculus reasoning for causal effects, and the third capturing a fully expressive do-calculus for arbitrary counterfactual queries. We give a corresponding series of finitary axiomatizations complete over both structural causal models and probabilistic programs, and show that satisfiability and validity for each language are decidable in polynomial space.

There are notable examples of online search improving over hand-coded or learned policies (e.g. AlphaZero) for sequential decision making. It is not clear, however, whether or not policy improvement is guaranteed for many of these approaches, even when given a perfect leaf evaluation function and transition model. Indeed, simple counterexamples show that seemingly reasonable online search procedures can hurt performance compared to the original policy. To address this issue, we introduce the choice function framework for analyzing online search procedures for policy improvement. A choice function specifies the actions to be considered at every node of a search tree, with all other actions being pruned. Our main contribution is to give sufficient conditions for stationary and non-stationary choice functions to guarantee that the value achieved by online search is no worse than the original policy. In addition, we describe a general parametric class of choice functions that satisfy those conditions and present an illustrative use case of the empirical utility of the framework.

Causal effect identification is one of the most prominent and well-understood problems in causal inference. Despite the generality and power of the results developed so far, there are still challenges in their applicability to practical settings, arguably due to the finitude of the samples. Simply put, there is a gap between causal effect identification and estimation. One popular setting in which sample-efficient estimators from finite samples exist is when the celebrated back-door condition holds. In this paper, we extend weighting-based methods developed for the back-door case to more general settings, and develop novel machinery for estimating causal effects using the weighting-based method as a building block. We derive graphical criteria under which causal effects can be estimated using this new machinery and demonstrate the effectiveness of the proposed method through simulation studies.

We present a novel framework for parallel exact inference in graphical models. Our framework supports error-correction during inference and enables fast verification that the result of inference is correct, with probabilistic soundness. The computational complexity of inference essentially matches the cost of w-cutset conditioning, a known generalization of Pearl's classical loop-cutset conditioning for inference. Verifying the result for correctness can be done with as little as essentially the square root of the cost of inference. Our main technical contribution amounts to designing a low-degree polynomial extension of the cutset approach, and then reducing to a univariate polynomial employing techniques recently developed for noninteractive probabilistic proof systems.

We introduce the safe linear stochastic bandit framework—a generalization of linear stochastic bandits—where, in each stage, the learner is required to select an arm with an expected reward that is no less than a predetermined (safe) threshold with high probability. We assume that the learner initially has knowledge of an arm that is known to be safe, but not necessarily optimal. Leveraging on this assumption, we introduce a learning algorithm that systematically combines known safe arms with exploratory arms to safely expand the set of safe arms over time, while facilitating safe greedy exploitation in subsequent stages. In addition to ensuring the satisfaction of the safety constraint at every stage of play, the proposed algorithm is shown to exhibit an expected regret that is no more than O(√T log(T)) after T stages of play.

The process of transporting and synthesizing experimental findings from heterogeneous data collections to construct causal explanations is arguably one of the most central and challenging problems in modern data science. This problem has been studied in the causal inference literature under the rubric of causal effect identifiability and transportability (Bareinboim and Pearl 2016). In this paper, we investigate a general version of this challenge where the goal is to learn conditional causal effects from an arbitrary combination of datasets collected under different conditions, observational or experimental, and from heterogeneous populations. Specifically, we introduce a unified graphical criterion that characterizes the conditions under which conditional causal effects can be uniquely determined from the disparate data collections. We further develop an efficient, sound, and complete algorithm that outputs an expression for the conditional effect whenever it exists, which synthesizes the available causal knowledge and empirical evidence; if the algorithm is unable to find a formula, then such synthesis is provably impossible, unless further parametric assumptions are made. Finally, we prove that do-calculus (Pearl 1995) is complete for this task, i.e., the inexistence of a do-calculus derivation implies the impossibility of constructing the targeted causal explanation.

Temporal logics over finite traces have recently seen wide application in a number of areas, from business process modelling, monitoring, and mining to planning and decision making. However, real-life dynamic systems contain a degree of uncertainty which cannot be handled with classical logics. We thus propose a new probabilistic temporal logic over finite traces using superposition semantics, where all possible evolutions are possible, until observed. We study the properties of the logic and provide automata-based mechanisms for deriving probabilistic inferences from its formulas. We then study a fragment of the logic with better computational properties. Notably, formulas in this fragment can be discovered from event log data using off-the-shelf existing declarative process discovery techniques.

Marginal MAP is a difficult mixed inference task for graphical models. Existing state-of-the-art algorithms for solving exactly this task are based on either depth-first or best-first sequential search over an AND/OR search space. In this paper, we explore and evaluate for the first time the power of parallel search for exact Marginal MAP inference. We introduce a new parallel shared-memory recursive best-first AND/OR search algorithm that explores the search space in a best-first manner while operating with limited memory. Subsequently, we develop a complete parallel search scheme that only parallelizes the conditional likelihood computations. We also extend the proposed algorithms into depth-first parallel search schemes. Our experiments on difficult benchmarks demonstrate the effectiveness of the parallel search algorithms against current sequential methods for solving Marginal MAP exactly.