Total: 19

Machine learning systems are increasingly being used to make impactful decisions such as loan applications and criminal justice risk assessments, and as such, ensuring fairness of these systems is critical. This is often challenging as the labels in the data are biased. This paper studies learning fair probability distributions from biased data by explicitly modeling a latent variable that represents a hidden, unbiased label. In particular, we aim to achieve demographic parity by enforcing certain independencies in the learned model. We also show that group fairness guarantees are meaningful only if the distribution used to provide those guarantees indeed captures the real-world data. In order to closely model the data distribution, we employ probabilistic circuits, an expressive and tractable probabilistic model, and propose an algorithm to learn them from incomplete data. We show on real-world datasets that our approach not only is a better model of how the data was generated than existing methods but also achieves competitive accuracy. Moreover, we also evaluate our approach on a synthetic dataset in which observed labels indeed come from fair labels but with added bias, and demonstrate that the fair labels are successfully retrieved.

An unbiased low-variance gradient estimator, termed GO gradient, was proposed recently for expectation-based objectives E_q_γ(y) [f(y)], where the random variable (RV) y may be drawn from a stochastic computation graph (SCG) with continuous (non-reparameterizable) internal nodes and continuous/discrete leaves. Based on the GO gradient, we present for E_q_γ(y) [f(y)] an unbiased low-variance Hessian estimator, named GO Hessian, which contains the deterministic Hessian as a special case. Considering practical implementation, we reveal that the GO Hessian in expectation obeys the chain rule and is therefore easy-to-use with auto-differentiation and Hessian-vector products, enabling efficient cheap exploitation of curvature information over deep SCGs. As representative examples, we present the GO Hessian for non-reparameterizable gamma and negative binomial RVs/nodes. Leveraging the GO Hessian, we develop a new second-order method for E_q_γ(y) [f(y)], with challenging experiments conducted to verify its effectiveness and efficiency.

In the influence maximization (IM) problem, we are given a social network and a budget k, and we look for a set of k nodes in the network, called seeds, that maximize the expected number of nodes that are reached by an influence cascade generated by the seeds, according to some stochastic model for influence diffusion. Extensive studies have been done on the IM problem, since his definition by Kempe, Kleinberg, and Tardos (2003). However, most of the work focuses on the non-adaptive version of the problem where all the k seed nodes must be selected before that the cascade starts. In this paper we study the adaptive IM, where the nodes are selected sequentially one by one, and the decision on the i-th seed can be based on the observed cascade produced by the first i-1 seeds. We focus on the full-adoption feedback in which we can observe the entire cascade of each previously selected seed and on the independent cascade model where each edge is associated with an independent probability of diffusing influence. Previous works showed that there are constant upper bounds on the adaptivity gap, which compares the performance of an adaptive algorithm against a non-adaptive one, but the analyses used to prove these bounds only works for specific graph classes such as in-arborescences, out-arborescences, and one-directional bipartite graphs. Our main result is the first sub-linear upper bound that holds for any graph. Specifically, we show that the adaptivity gap is upper-bounded by ∛n+1, where n is the number of nodes in the graph. Moreover we improve over the known upper bound for in-arborescences from 2e/(e-1)≈3.16 to 2e²/(e²-1)≈2.31. Finally, we study α-bounded graphs, a class of undirected graphs in which the sum of node degrees higher than two is at most α, and show that the adaptivity gap is upper-bounded by √α+O(1). Moreover, we show that in 0-bounded graphs, i.e. undirected graphs in which each connected component is a path or a cycle, the adaptivity gap is at most 3e³/(e³-1)≈3.16. To prove our bounds, we introduce new techniques to relate adaptive policies with non-adaptive ones that might be of their own interest.

Despite the popularity of Convolutional Neural Networks (CNN), the problem of uncertainty quantification (UQ) of CNN has been largely overlooked. Lack of efficient UQ tools severely limits the application of CNN in certain areas, such as medicine, where prediction uncertainty is critically important. Among the few existing UQ approaches that have been proposed for deep learning, none of them has theoretical consistency that can guarantee the uncertainty quality. To address this issue, we propose a novel bootstrap based framework for the estimation of prediction uncertainty. The inference procedure we use relies on convexified neural networks to establish the theoretical consistency of bootstrap. Our approach has a significantly less computational load than its competitors, as it relies on warm-starts at each bootstrap that avoids refitting the model from scratch. We further explore a novel transfer learning method so our framework can work on arbitrary neural networks. We experimentally demonstrate our approach has a much better performance compared to other baseline CNNs and state-of-the-art methods on various image datasets.

Robust Markov Decision Processes (MDPs) are a powerful framework for modeling sequential decision making problems with model uncertainty. This paper proposes the first first-order framework for solving robust MDPs. Our algorithm interleaves primal-dual first-order updates with approximate Value Iteration updates. By carefully controlling the tradeoff between the accuracy and cost of Value Iteration updates, we achieve an ergodic convergence rate that is significantly better than classical Value Iteration algorithms in terms of the number of states S and the number of actions A on ellipsoidal and Kullback-Leibler s-rectangular uncertainty sets. In numerical experiments on ellipsoidal uncertainty sets we show that our algorithm is significantly more scalable than state-of-the-art approaches. Our framework is also the first one to solve robust MDPs with s-rectangular KL uncertainty sets.

Level Set Estimation (LSE) is an important problem with applications in various fields such as material design, biotechnology, machine operational testing, etc. Existing techniques suffer from the scalability issue, that is, these methods do not work well with high dimensional inputs. This paper proposes novel methods to solve the high dimensional LSE problems using Bayesian Neural Networks. In particular, we consider two types of LSE problems: (1) explicit LSE problem where the threshold level is a fixed user-specified value, and, (2) implicit LSE problem where the threshold level is defined as a percentage of the (unknown) maximum of the objective function. For each problem, we derive the corresponding theoretic information based acquisition function to sample the data points so as to maximally increase the level set accuracy. Furthermore, we also analyse theoretical time complexity of our proposed acquisition functions, and suggest a practical methodology to efficiently tune the network hyper-parameters to achieve high model accuracy. Numerical experiments on both synthetic and real-world datasets show that our proposed methods can achieve better results compared to existing state-of-the-art approaches.

Causal inference from observational data is receiving wide applications in many fields. However, unidentifiable situations, where causal effects cannot be uniquely computed from observational data, pose critical barriers to applying causal inference to complicated real applications. In this paper, we develop a bounding method for estimating the average causal effect (ACE) under unidentifiable situations due to hidden confounding based on Pearl's structural causal model. We propose to parameterize the unknown exogenous random variables and structural equations of a causal model using neural networks and implicit generative models. Then, using an adversarial learning framework, we search the parameter space to explicitly traverse causal models that agree with the given observational distribution, and find those that minimize or maximize the ACE to obtain its lower and upper bounds. The proposed method does not make assumption about the type of structural equations and variables. Experiments using both synthetic and real-world datasets are conducted.

Identifying causal effects from observational data is a pervasive challenge found throughout the empirical sciences. Very general methods have been developed to decide the identifiability of a causal quantity from a combination of observational data and causal knowledge about the underlying system. In practice, however, there are still challenges to estimating identifiable causal functionals from finite samples. Recently, a method known as double/debiased machine learning (DML) (Chernozhukov et al. 2018) has been proposed to learn parameters leveraging modern machine learning techniques, which is both robust to model misspecification and bias-reducing. Still, DML has only been used for causal estimation in settings when the back-door condition (also known as conditional ignorability) holds. In this paper, we develop a new, general class of estimators for any identifiable causal functionals that exhibit DML properties, which we name DML-ID. In particular, we introduce a complete identification algorithm that returns an influence function (IF) for any identifiable causal functional. We then construct the DML estimator based on the derived IF. We show that DML-ID estimators hold the key properties of debiasedness and doubly robustness. Simulation results corroborate with the theory.

Contextual bandits algorithms have become essential in real-world user interaction problems in recent years. However, these algorithms represent context as attribute value representation, which makes them infeasible for real world domains like social networks, which are inherently relational. We propose Relational Boosted Bandits (RB2), a contextual bandits algorithm for relational domains based on (relational) boosted trees. RB2 enables us to learn interpretable and explainable models due to the more descriptive nature of the relational representation. We empirically demonstrate the effectiveness and interpretability of RB2 on tasks such as link prediction, relational classification, and recommendation.

This paper deals with the identification problem of causal effects in randomized trials with noncompliance. In this problem, generally, causal effects are not identifiable and thus have been evaluated under some strict assumptions, or through the bounds. Different from existing studies, we propose novel identification conditions of joint probabilities of potential outcomes, which allow us to derive a consistent estimator of the causal effect. Regarding the identification conditions of joint probabilities of potential outcomes, the assumptions of monotonicity (Pearl, 2009), independence between potential outcomes (Robins & Richardson, 2011), gain equality (Li & Pearl, 2019) and specific functional relationships between cause and effect (Pearl, 2009) have been utilized. In contrast, without such assumptions, the proposed conditions enable us to evaluate joint probabilities of potential outcomes using an instrumental variable and a proxy variable of potential outcomes. The results of this paper extend the range of solvable identification problems in causal inference.

We propose a new framework to learn non-parametric graphical models from continuous observational data. Our method is based on concepts from information theory in order to discover independences and causality between variables: the conditional and multivariate mutual information (such as \cite{verny2017learning} for discrete models). To estimate these quantities, we propose non-parametric estimators relying on the Bernstein copula and that are constructed by exploiting the relation between the mutual information and the copula entropy \cite{ma2011mutual, belalia2017testing}. To our knowledge, this relation is only documented for the bivariate case and, for the need of our algorithms, is here extended to the conditional and multivariate mutual information. This framework leads to a new algorithm to learn continuous non-parametric Bayesian network. Moreover, we use this estimator to speed up the BIC algorithm proposed in \cite{elidan2010copula} by taking advantage of the decomposition of the likelihood function in a sum of mutual information \cite{koller2009probabilistic}. Finally, our method is compared in terms of performances and complexity with other state of the art techniques to learn Copula Bayesian Networks and shows superior results. In particular, it needs less data to recover the true structure and generalizes better on data that are not sampled from Gaussian distributions.

Influence diagrams (IDs) are graphical models for representing and reasoning with sequential decision-making problems under uncertainty. Limited memory influence diagrams (LIMIDs) model a decision-maker (DM) who forgets the history in the course of making a sequence of decisions. The standard inference task in IDs and LIMIDs is to compute the maximum expected utility (MEU), which is one of the most challenging tasks in graphical models. We present a model decomposition framework in both IDs and LIMIDs, which we call submodel decomposition that generates a tree of single-stage decision problems through a tree clustering scheme. We also develop a valuation algebra over the submodels that leads to a hierarchical message passing algorithm that propagates conditional expected utility functions over a submodel-tree as external messages. We show that the overall complexity is bounded by the maximum tree-width over the submodels, common in graphical model algorithms. Finally, we present a new method for computing upper bounds over a submodel-tree by first exponentiating the utility functions yielding a standard probabilistic graphical model as an upper bound and then applying standard variational upper bounds for the marginal MAP inference, yielding tighter upper bounds compared with state-of-the-art bounding schemes for the MEU task.

Influence diagrams provide a modeling and inference framework for sequential decision problems, representing the probabilistic knowledge by a Bayesian network and the preferences of an agent by utility functions over the random variables and decision variables. Computing the maximum expected utility (MEU) and the optimizing policy is exponential in the constrained induced width and therefore is notoriously difficult for larger models. In this paper, we develop a new bounding scheme for MEU that applies partitioning based approximations on top of the decomposition scheme called a multi-operator cluster DAG for influence diagrams that is more sensitive to the underlying structure of the model than the classical join-tree decomposition of influence diagrams. Our bounding scheme utilizes a cost-shifting mechanism to tighten the bound further. We demonstrate the effectiveness of the proposed scheme on various hard benchmarks.

We consider the problem of estimating a spectral risk measure (SRM) from i.i.d. samples, and propose a novel method that is based on numerical integration. We show that our SRM estimate concentrates exponentially, when the underlying distribution has bounded support. Further, we also consider the case when the underlying distribution satisfies an exponential moment bound, which includes sub-Gaussian and subexponential distributions. For these distributions, we derive a concentration bound for our estimation scheme. We validate the theoretical findings on a synthetic setup, and in a vehicular traffic routing application.

We introduce Probabilistic Dependency Graphs (PDGs), a new class of directed graphical models. PDGs can capture inconsistent beliefs in a natural way and are more modular than Bayesian Networks (BNs), in that they make it easier to incorporate new information and restructure the representation. We show by example how PDGs are an especially natural modeling tool. We provide three semantics for PDGs, each of which can be derived from a scoring function (on joint distributions over the variables in the network) that can be viewed as representing a distribution's incompatibility with the PDG. For the PDG corresponding to a BN, this function is uniquely minimized by the distribution the BN represents, showing that PDG semantics extend BN semantics. We show further that factor graphs and their exponential families can also be faithfully represented as PDGs, while there are significant barriers to modeling a PDG with a factor graph.

Upper confidence bound (UCB) based contextual bandit algorithms require one to know the tail property of the reward distribution. Unfortunately, such tail property is usually unknown or difficult to specify in real-world applications. Using a tail property heavier than the ground truth leads to a slow learning speed of the contextual bandit algorithm, while using a lighter one may cause the algorithm to diverge. To address this fundamental problem, we develop an estimator (evaluated from historical rewards) for the contextual bandit UCB based on the multiplier bootstrapping technique. We first establish sufficient conditions under which our estimator converges asymptotically to the ground truth of contextual bandit UCB. We further derive a second order correction for our estimator so as to obtain its confidence level with a finite number of rounds. To demonstrate the versatility of the estimator, we apply it to design a BootLinUCB algorithm for the contextual bandit. We prove that the BootLinUCB has a sub-linear regret upper bound and also conduct extensive experiments to validate its superior performance.

The creation of Bayesian networks often requires the specification of a large number of parameters, making it highly desirable to be able to learn these parameters from historical data. In many cases, such data has uncertainty associated with it, including cases in which this data comes from unstructured analysis or from sensors. When creating diagnosis networks, for example, unstructured analysis algorithms can be run on the historical text descriptions or images of previous cases so as to extract data for learning Bayesian network parameters, but such derived data has inherent uncertainty associated with it due to the nature of such algorithms. Because of the inability of current Bayesian network parameter learning algorithms to incorporate such uncertainty, common approaches either ignore this uncertainty, thus reducing the resulting accuracy, or completely disregard such data. We present an approach for learning Bayesian network parameters that explicitly incorporates such uncertainty, and which is a natural extension of the Bayesian network formalism. We present a generalization of the Expectation Maximization parameter learning algorithm that enables it to handle any historical data with likelihood-evidence-based uncertainty, as well as an empirical validation demonstrating the improved accuracy and convergence enabled by our approach. We also prove that our extended algorithm maintains the convergence and correctness properties of the original EM algorithm, while explicitly incorporating data uncertainty in the learning process.

Counting and uniform sampling of directed acyclic graphs (DAGs) from a Markov equivalence class are fundamental tasks in graphical causal analysis. In this paper, we show that these tasks can be performed in polynomial time, solving a long-standing open problem in this area. Our algorithms are effective and easily implementable. Experimental results show that the algorithms significantly outperform state-of-the-art methods.

We investigate the problem of bounding causal effects from experimental studies in which treatment assignment is randomized but the subject compliance is imperfect. It is well known that under such conditions, the actual causal effects are not point-identifiable due to uncontrollable unobserved confounding. In their seminal work, Balke and Pearl (1994) derived the tightest bounds over the causal effects in this settings by employing an algebra program to derive analytic expressions. However, Pearl's approach assumes the primary outcome to be discrete and finite. Solving such a program could be intractable when high-dimensional context variables are present. In this paper, we present novel non-parametric methods to bound causal effects on the continuous outcome from studies with imperfect compliance. These bounds could be generalized to settings with a high-dimensional context.