| Total: 15

Hashing-based model counting has emerged as a promising approach for large-scale probabilistic inference on graphical models. A key component of these techniques is the use of xor-based 2-universal hash functions that operate over Boolean domains. Many counting problems arising in probabilistic inference are, however, naturally encoded over finite discrete domains. Techniques based on bit-level (or Boolean) hash functions require these problems to be propositionalized, making it impossible to leverage the remarkable progress made in SMT (Satisfiability Modulo Theory) solvers that can reason directly over words (or bit-vectors). In this work, we present the first approximate model counter that uses word-level hashing functions, and can directly leverage the power of sophisticated SMT solvers. Empirical evaluation over an extensive suite of benchmarks demonstrates the promise of the approach.

Many applications call for learning causal models from relational data. We investigate Relational Causal Models (RCM) under relational counterparts of adjacency-faithfulness and orientation-faithfulness, yielding a simple approach to identifying a subset of relational d-separation queries needed for determining the structure of an RCM using d-separation against an unrolled DAG representation of the RCM. We provide original theoretical analysis that offers the basis of a sound and efficient algorithm for learning the structure of an RCM from relational data. We describe RCD-Light, a sound and efficient constraint-based algorithm that is guaranteed to yield a correct partially-directed RCM structure with at least as many edges oriented as in that produced by RCD, the only other existing algorithm for learning RCM. We show that unlike RCD, which requires exponential time and space, RCD-Light requires only polynomial time and space to orient the dependencies of a sparse RCM.

Bounding the tree-width of a Bayesian network can reduce the chance of overfitting, and allows exact inference to be performed efficiently. Several existing algorithms tackle the problem of learning bounded tree-width Bayesian networks by learning from k-trees as super-structures, but they do not scale to large domains and/or large tree-width. We propose a guided search algorithm to find k-trees with maximum Informative scores, which is a measure of quality for the k-tree in yielding good Bayesian networks. The algorithm achieves close to optimal performance compared to exact solutions in small domains, and can discover better networks than existing approximate methods can in large domains. It also provides an optimal elimination order of variables that guarantees small complexity for later runs of exact inference. Comparisons with well-known approaches in terms of learning and inference accuracy illustrate its capabilities.

Probabilistic inference in many real-world problems requires graphical models with deterministic algebraic constraints between random variables (e.g., Newtonian mechanics, Pascal’s law, Ohm’s law) that are known to be problematic for many inference methods such as Monte Carlo sampling. Fortunately, when such constraintsare invertible, the model can be collapsed and the constraints eliminated through the well-known Jacobian-based change of variables. As our first contributionin this work, we show that a much broader classof algebraic constraints can be collapsed by leveraging the properties of a Dirac delta model of deterministic constraints. Unfortunately, the collapsing processcan lead to highly piecewise densities that pose challenges for existing probabilistic inference tools. Thus,our second contribution to address these challenges is to present a variation of Gibbs sampling that efficiently samples from these piecewise densities. The key insight to achieve this is to introduce a class of functions that (1) is sufficiently rich to approximate arbitrary models up to arbitrary precision, (2) is closed under dimension reduction (collapsing) for models with (non)linear algebraic constraints and (3) always permits one analytical integral sufficient to automatically derive closed-form conditionals for Gibbs sampling. Experiments demonstrate the proposed sampler converges at least an order of magnitude faster than existing Monte Carlo samplers.

This paper explores the anytime performance of search-based algorithms for solving the Marginal MAP task over graphical models. The current state of the art for solving this challenging task is based on best-first search exploring the AND/OR graph with the guidance of heuristics based on mini-bucket and variational cost-shifting principles. Yet, those schemes are uncompromising in that they solve the problem exactly, or not at all, and often suffer from memory problems. In this work, we explore the well known principle of weighted search for converting best-first search solvers into anytime schemes. The weighted best-first search schemes report a solution early in the process by using inadmissible heuristics, and subsequently improve the solution. While it was demonstrated recently that weighted schemes can yield effective anytime behavior for pure MAP tasks, Marginal MAP is far more challenging (e.g., a conditional sum must be evaluated for every solution). Yet, in an extensive empirical analysis we show that weighted schemes are indeed highly effective for Marginal MAP yielding the most competitive schemes to date for this task.

We consider the problem of sampling from a discrete probability distribution specified by a graphical model. Exact samples can, in principle, be obtained by computing the mode of the original model perturbed with an exponentially many i.i.d. random variables. We propose a novel algorithm that views this as a combinatorial optimization problem and searches for the extreme state using a standard integer linear programming (ILP) solver, appropriately extended to account for the random perturbation. Our technique, GumbelMIP, leverages linear programming (LP) relaxations to evaluate the qualityof samples and prune large portions of the search space, and can thus scale to large tree-width models beyond the reach of current exact inference methods. Further, when the optimization problem is not solved to optimality, our method yields a novel approximate sampling technique. We empirically demonstrate that our approach parallelizes well, our exact sampler scales better than alternative approaches, and our approximate sampler yields better quality samples than a Gibbs sampler and a low-dimensional perturbation method.

POMDPs are standard models for probabilistic planning problems, where an agent interacts with an uncertain environment. We study the problem of almost-sure reachability, where given a set of target states, the question is to decide whether there is a policy to ensure that the target set is reached with probability 1 (almost-surely). While in general the problem is EXPTIME-complete, in many practical cases policies with a small amount of memory suffice. Moreover, the existing solution to the problem is explicit, which first requires to construct explicitly an exponential reduction to a belief-support MDP. In this work, we first study the existence of observation-stationary strategies, which is NP-complete, and then small-memory strategies. We present a symbolic algorithm by an efficient encoding to SAT and using a SAT solver for the problem. We report experimental results demonstrating the scalability of our symbolic (SAT-based) approach.

Autonomous agents operating in partially observable stochastic environments often face the problem of optimizing expected performance while bounding the risk of violating safety constraints. Such problems can be modeled as chance-constrained POMDP's (CC-POMDP's). Our first contribution is a systematic derivation of execution risk in POMDP domains, which improves upon how chance constraints are handled in the constrained POMDP literature. Second, we present RAO*, a heuristic forward search algorithm producing optimal, deterministic, finite-horizon policies for CC-POMDP's. In addition to the utility heuristic, RAO* leverages an admissible execution risk heuristic to quickly detect and prune overly-risky policy branches. Third, we demonstrate the usefulness of RAO* in two challenging domains of practical interest: power supply restoration and autonomous science agents.

In practice the vast majority of causal effect estimations from observational data are computed using adjustment sets which avoid confounding by adjusting for appropriate covariates. Recently several graphical criteria for selecting adjustment sets have been proposed. They handle causal directed acyclic graphs (DAGs) as well as more general types of graphs that represent Markov equivalence classes of DAGs, including completed partially directed acyclic graphs (CPDAGs). Though expressed in graphical language, it is not obvious how the criteria can be used to obtain effective algorithms for finding adjustment sets. In this paper we provide a new criterion which leads to an efficient algorithmic framework to find, test and enumerate covariate adjustments for chain graphs - mixed graphs representing in a compact way a broad range of Markov equivalence classes of DAGs.

Spatio-temporal matching of services to customers online is a problem that arises on a large scale in many domains associated with shared transportation (ex: taxis, ride sharing, super shuttles, etc.) and delivery services (ex: food, equipment, clothing, home fuel, etc.). A key characteristic of these problems is that matching of services to customers in one round has a direct impact on the matching of services to customers in the next round. For instance, in the case of taxis, in the second round taxis can only pick up customers closer to the drop off point of the customer from the first round of matching. Traditionally, greedy myopic approaches have been adopted to address such large scale online matching problems. While they provide solutions in a scalable manner, due to their myopic nature the quality of matching obtained can be improved significantly (demonstrated in our experimental results). In this paper, we present a two stage stochastic optimization formulation to consider expected future demand. We then provide multiple enhancements to solve large scale problems more effectively and efficiently. Finally, we demonstrate the significant improvement provided by our techniques over myopic approaches on two real world taxi data sets.

Many inference problems are naturally formulated using hard and soft constraints over relational domains: the desired solution must satisfy the hard constraints, while optimizing the objectives expressed by the soft constraints. Existing techniques for solving such constraints rely on efficiently grounding a sufficient subset of constraints that is tractable to solve. We present an eager-lazy grounding algorithm that eagerly exploits proofs and lazily refutes counterexamples. We show that our algorithm achieves significant speedup over existing approaches without sacrificing soundness for real-world applications from information retrieval and program analysis.

We propose the structured naive Bayes (SNB) classifier, which augments the ubiquitous naive Bayes classifier with structured features. SNB classifiers facilitate the use of complex features, such as combinatorial objects (e.g., graphs, paths and orders) in a general but systematic way. Underlying the SNB classifier is the recently proposed Probabilistic Sentential Decision Diagram (PSDD), which is a tractable representation of probability distributions over structured spaces. We illustrate the utility and generality of the SNB classifier via case studies. First, we show how we can distinguish players of simple games in terms of play style and skill level based purely on observing the games they play. Second, we show how we can detect anomalous paths taken on graphs based purely on observing the paths themselves.

Cutset networks — OR (decision) trees that have Bayesian networks whose treewidth is bounded by one at each leaf — are a new class of tractable probabilistic models that admit fast, polynomial-time inference and learning algorithms. This is unlike other state-of-the-art tractable models such as thin junction trees, arithmetic circuits and sum-product networks in which inference is fast and efficient but learning can be notoriously slow. In this paper, we take advantage of this unique property to develop fast algorithms for learning ensembles of cutset networks. Specifically, we consider generalized additive mixtures of cutset networks and develop sequential boosting-based and parallel bagging-based approaches for learning them from data. We demonstrate, via a thorough experimental evaluation, that our new algorithms are superior to competing approaches in terms of test-set log-likelihood score and learning time.

The maximum likelihood estimator (MLE) is generally asymptotically consistent but is susceptible to over-fitting. To combat this problem, regularization methods which reduce the variance at the cost of (slightly) increasing the bias are often employed in practice. In this paper, we present an alternative variance reduction (regularization) technique that quantizes the MLE estimates as a post processing step, yielding a smoother model having several tied parameters. We provide and prove error bounds for our new technique and demonstrate experimentally that it often yields models having higher test-set log-likelihood than the ones learned using the MLE. We also propose a new importance sampling algorithm for fast approximate inference in models having several tied parameters. Our experiments show that our new inference algorithm is superior to existing approaches such as Gibbs sampling and MC-SAT on models having tied parameters, learned using our quantization-based approach.

Many recent algorithms for approximate model counting are based on a reduction to combinatorial searches over random subsets of the space defined by parity or XOR constraints. Long parity constraints (involving many variables) provide strong theoretical guarantees but are computationally difficult. Short parity constraints are easier to solve but have weaker statistical properties. It is currently not known how long these parity constraints need to be. We close the gap by providing matching necessary and sufficient conditions on the required asymptotic length of the parity constraints. Further, we provide a new family of lower bounds and the first non-trivial upper bounds on the model count that are valid for arbitrarily short XORs. We empirically demonstrate the effectiveness of these bounds on model counting benchmarks and in a Satisfiability Modulo Theory (SMT) application motivated by the analysis of contingency tables in statistics.