| Total: 22

A standard approach for inference in probabilistic formalisms with first-order constructs is lifted variable elimination (LVE) for single queries. To handle multiple queries efficiently, the lifted junction tree algorithm (LJT) employs a first-order cluster representation of a model and LVE as a subroutine. Both algorithms answer conjunctive queries of propositional random variables, shattering the model on the query, which causes unnecessary groundings for conjunctive queries of interchangeable variables. This paper presents parameterised queries as a means to avoid groundings, applying the lifting idea to queries. Parameterised queries enable LVE and LJT to compute answers faster, while compactly representing queries and answers.

We propose a new localized inference algorithm for answering marginalization queries in large graphical models with the correlation decay property. Given a query variable and a large graphical model, we define a much smaller model in a local region around the query variable in the target model so that the marginal distribution of the query variable can be accurately approximated. We introduce two approximation error bounds based on the Dobrushin’s comparison theorem and apply our bounds to derive a greedy expansion algorithm that efficiently guides the selection of neighbor nodes for localized inference. We verify our theoretical bounds on various datasets and demonstrate that our localized inference algorithm can provide fast and accurate approximation for large graphical models.

A seller with unlimited inventory of a digital good interacts with potential buyers with i.i.d. valuations. The seller can adaptively quote prices to each buyer to maximize long-term profits, but does not know the valuation distribution exactly. Under a linear demand model, we consider two information settings: partially censored, where agents who buy reveal their true valuations after the purchase is completed, and completely censored, where agents never reveal their valuations. In the partially censored case, we prove that myopic pricing with a Pareto prior is Bayes optimal and has finite regret. In both settings, we evaluate the myopic strategy against more sophisticated look-aheads using three valuation distributions generated from real data on auctions of physical goods, keyword auctions, and user ratings, where the linear demand assumption is clearly violated. For some datasets, complete censoring actually helps, because the restricted data acts as a "regularizer" on the posterior, preventing it from being affected too much by outliers.

This paper considers the problem of removing costly features from a Bayesian network classifier. We want the classifier to be robust to these changes, and maintain its classification behavior. To this end, we propose a closeness metric between Bayesian classifiers, called the expected classification agreement (ECA). Our corresponding trimming algorithm finds an optimal subset of features and a new classification threshold that maximize the expected agreement, subject to a budgetary constraint. It utilizes new theoretical insights to perform branch-and-bound search in the space of feature sets, while computing bounds on the ECA. Our experiments investigate both the runtime cost of trimming and its effect on the robustness and accuracy of the final classifier.

Matrix Factorization (MF) is widely used in Recommender Systems (RSs) for estimating missing ratings in the rating matrix. MF faces major challenges of handling very sparse and large data. Poisson Factorization (PF) as an MF variant addresses these challenges with high efficiency by only computing on those non-missing elements. However, ignoring the missing elements in computation makes PF weak or incapable for dealing with columns or rows with very few observations (corresponding to sparse items or users). In this work, Metadata-dependent Poisson Factorization (MPF) is invented to address the user/item sparsity by integrating user/item metadata into PF. MPF adds the metadata-based observed entries to the factorized PF matrices. In addition, similar to MF, choosing the suitable number of latent components for PF is very expensive on very large datasets. Accordingly, we further extend MPF to Metadata-dependent Infinite Poisson Factorization (MIPF) that integrates Bayesian Nonparametric (BNP) technique to automatically tune the number of latent components. Our empirical results show that, by integrating metadata, MPF/MIPF significantly outperform the state-of-the-art PF models for sparse and large datasets. MIPF also effectively estimates the number of latent components.

By optimizing probability distributions over discrete latent codes, Stochastic Generative Hashing (SGH) bypasses the critical and intractable binary constraints on hash codes. While encouraging results were reported, SGH still suffers from the deficient usage of latent codes, i.e., there often exist many uninformative latent dimensions in the code space, a disadvantage inherited from its auto-encoding variational framework. Motivated by the fact that code redundancy usually is severer when more complex decoder network is used, in this paper, we propose a constrained deep generative architecture to simplify the decoder for data reconstruction. Specifically, our new framework forces the latent hashing codes to not only reconstruct data through the generative network but also retain minimal squared L2 difference to the last real-valued network hidden layer. Furthermore, during posterior inference, we propose to regularize the standard auto-encoding objective with an additional term that explicitly accounts for the negative redundancy degree of latent code dimensions. We interpret such modifications as Bayesian posterior regularization and design an adversarial strategy to optimize the generative, the variational, and the redundancy-resistanting parameters. Empirical results show that our new method can significantly boost the quality of learned codes and achieve state-of-the-art performance for image retrieval.

Computing the effects of interventions from observational data is an important task encountered in many data-driven sciences. The problem is addressed by identifying the post-interventional distribution with an expression that involves only quantities estimable from the pre-interventional distribution over observed variables, given some knowledge about the causal structure. In this work, we relax the requirement of having a fully specified causal structure and study the identifiability of effects with a singleton intervention (X), supposing that the structure is known only up to an equivalence class of causal diagrams, which is the output of standard structural learning algorithms (e.g., FCI). We derive a necessary and sufficient graphical criterion for the identifiability of the effect of X on all observed variables. We further establish a sufficient graphical criterion to identify the effect of X on a subset of the observed variables, and prove that it is strictly more powerful than the current state-of-the-art result on this problem.

Weighted model integration (WMI) extends weighted model counting (WMC) to the integration of functions over mixed discrete-continuous probability spaces. It has shown tremendous promise for solving inference problems in graphical models and probabilistic programs. Yet, state-of-the-art tools for WMI are generally limited either by the range of amenable theories, or in terms of performance. To address both limitations, we propose the use of extended algebraic decision diagrams (XADDs) as a compilation language for WMI. Aside from tackling typical WMI problems, XADDs also enable partial WMI yielding parametrized solutions. To overcome the main roadblock of XADDs -- the computational cost of integration -- we formulate a novel and powerful exact symbolic dynamic programming (SDP) algorithm that seamlessly handles Boolean, integer-valued and real variables, and is able to effectively cache partial computations, unlike its predecessor. Our empirical results demonstrate that these contributions can lead to a significant computational reduction over existing probabilistic inference algorithms.

Policy optimization on high-dimensional continuous control tasks exhibits its difficulty caused by the large variance of the policy gradient estimators. We present the action subspace dependent gradient (ASDG) estimator which incorporates the Rao-Blackwell theorem (RB) and Control Variates (CV) into a unified framework to reduce the variance. To invoke RB, our proposed algorithm (POSA) learns the underlying factorization structure among the action space based on the second-order advantage information. POSA captures the quadratic information explicitly and efficiently by utilizing the wide \& deep architecture. Empirical studies show that our proposed approach demonstrates the performance improvements on high-dimensional synthetic settings and OpenAI Gym's MuJoCo continuous control tasks.

Sparse connectivity is an important factor behind the success of convolutional neural networks and recurrent neural networks. In this paper, we consider the problem of learning sparse connectivity for feedforward neural networks (FNNs). The key idea is that a unit should be connected to a small number of units at the next level below that are strongly correlated. We use Chow-Liu's algorithm to learn a tree-structured probabilistic model for the units at the current level, use the tree to identify subsets of units that are strongly correlated, and introduce a new unit with receptive field over the subsets. The procedure is repeated on the new units to build multiple layers of hidden units. The resulting model is called a TRF-net. Empirical results show that, when compared to dense FNNs, TRF-net achieves better or comparable classification performance with much fewer parameters and sparser structures. They are also more interpretable.

Patent litigation is an expensive legal process faced by many companies. To reduce the cost of patent litigation, one effective approach is proactive management based on predictive analysis. However, automatic prediction of patent litigation is still an open problem due to the complexity of lawsuits. In this paper, we propose a data-driven framework, Convolutional Tensor Factorization (CTF), to identify the patents that may cause litigations between two companies. Specifically, CTF is a hybrid modeling approach, where the content features from the patents are represented by the Network embedding-combined Convolutional Neural Network (NCNN) and the lawsuit records of companies are summarized in a tensor, respectively. Then, CTF integrates NCNN and tensor factorization to systematically exploit both content information and collaborative information from large amount of data. Finally, the risky patents will be returned by a learning to rank strategy. Extensive experimental results on real-world data demonstrate the effectiveness of our framework.

This paper presents a principled way for dealing with occlusions in visual tracking which is a long-standing issue in computer vision but largely remains unsolved. As the major innovation, we develop a learning-based jump-diffusion process to jointly track object locations and estimate their visibility statuses over time. Our method employs in particular a set of jump dynamics to change object's visibility statuses and a set of diffusion dynamics to track objects in videos. Different from the traditional jump-diffusion process that stochastically generates dynamics, we utilize deep policy functions to determine the best dynamic at the present step and learn the optimal policies from raw videos using reinforcement learning methods.Our method is capable of tracking objects with severe occlusions in crowded scenes and thus recovers the complete trajectories of objects that undergo multiple interactions with others. We evaluate the proposed method on challenging video sequences and compare it to alternative methods. Significant improvements are obtained particularly for the videos including frequent interactions or occlusions.

We present a model for exact recursive Bayesian filtering based on lifted multiset states. Combining multisets with lifting makes it possible to simultaneously exploit multiple strategies for reducing inference complexity when compared to list-based grounded state representations. The core idea is to borrow the concept of Maximally Parallel Multiset Rewriting Systems and to enhance it by concepts from Rao-Blackwellization and Lifted Inference, giving a representation of state distributions that enables efficient inference. In worlds where the random variables that define the system state are exchangeable -- where the identity of entities does not matter -- it automatically uses a representation that abstracts from ordering (achieving an exponential reduction in complexity) -- and it automatically adapts when observations or system dynamics destroy exchangeability by breaking symmetry.

The Marginal MAP inference task is known to be extremely hard particularly because the evaluation of each complete MAP assignment involves an exact likelihood computation (a combinatorial sum). For this reason, most recent state-of-the-art solvers that focus on computing anytime upper and lower bounds on the optimal value are limited to solving instances with tractable conditioned summation subproblems. In this paper, we develop new search-based bounding schemes for Marginal MAP that produce anytime upper and lower bounds without performing exact likelihood computations. The empirical evaluation demonstrates the effectiveness of our new methods against the current best-performing search-based bounds.

Traditional methods for handling incomplete data, including Multiple Imputation and Maximum Likelihood, require that the data be Missing At Random (MAR). In most cases, however, missingness in a variable depends on the underlying value of that variable. In this work, we devise model-based methods to consistently estimate mean, variance and covariance given data that are Missing Not At Random (MNAR). While previous work on MNAR data require variables to be discrete, we extend the analysis to continuous variables drawn from Gaussian distributions. We demonstrate the merits of our techniques by comparing it empirically to state of the art software packages.

We propose a method for reliable prediction in multi-class classification, where reliability refers to the possibility of partial abstention in cases of uncertainty. More specifically, we allow for predictions in the form of preorder relations on the set of classes, thereby generalizing the idea of set-valued predictions. Our approach relies on combining learning by pairwise comparison with a recent proposal for modeling uncertainty in classification, in which a distinction is made between reducible (a.k.a. epistemic) uncertainty caused by a lack of information and irreducible (a.k.a. aleatoric) uncertainty due to intrinsic randomness. The problem of combining uncertain pairwise predictions into a most plausible preorder is then formalized as an integer programming problem. Experimentally, we show that our method is able to appropriately balance reliability and precision of predictions.

We consider the following nearest assignment problem (NAP): given a Bayesian network B and probability value q, find a configuration w of variables in B such that difference between q and the probability of w is minimized. NAP is much harder than conventional inference problems such as finding the most probable explanation and is NP-hard even on independent Bayesian networks (IBNs), which are networks having no edges. Therefore, in order to solve NAP on IBNs, we show how to encode it as a two-way number partitioning problem. This encoding allows us to use greedy poly-time approximation algorithms from the number partitioning literature to yield an algorithm with guarantees for solving NAP on IBNs. We extend this basic algorithm from independent networks to arbitrary probabilistic graphical models by leveraging cutset conditioning and (Rao-Blackwellised) sampling algorithms. We derive approximation and complexity guarantees for our new algorithms and show experimentally that they are quite accurate in practice.

We propose an approach for explaining Bayesian network classifiers, which is based on compiling such classifiers into decision functions that have a tractable and symbolic form. We introduce two types of explanations for why a classifier may have classified an instance positively or negatively and suggest algorithms for computing these explanations. The first type of explanation identifies a minimal set of the currently active features that is responsible for the current classification, while the second type of explanation identifies a minimal set of features whose current state (active or not) is sufficient for the classification. We consider in particular the compilation of Naive and Latent-Tree Bayesian network classifiers into Ordered Decision Diagrams (ODDs), providing a context for evaluating our proposal using case studies and experiments based on classifiers from the literature.

Complex causal networks underlie many real-world problems, from the regulatory interactions between genes to the environmental patterns used to understand climate change. Computational methods seek to infer these causal networks using observational data and domain knowledge. In this paper, we identify three key requirements for inferring the structure of causal networks for scientific discovery: (1) robustness to noise in observed measurements; (2) scalability to handle hundreds of variables; and (3) flexibility to encode domain knowledge and other structural constraints. We first formalize the problem of joint probabilistic causal structure discovery. We develop an approach using probabilistic soft logic (PSL) that exploits multiple statistical tests, supports efficient optimization over hundreds of variables, and can easily incorporate structural constraints, including imperfect domain knowledge. We compare our method against multiple well-studied approaches on biological and synthetic datasets, showing improvements of up to 20% in F1-score over the best performing baseline in realistic settings.

Counting the linear extensions of a given partial order not only has several applications in artificial intelligence but also represents a hard problem that challenges modern paradigms for approximate counting. Recently, Talvitie et al. (AAAI 2018) showed that an exponential time scheme beats the fastest known polynomial time schemes in practice, even if allowing hours of running time. Here, we present a novel scheme, relaxation Tootsie Pop, which in our experiments exhibits polynomial characteristics and significantly outperforms previous schemes. We also instantiate state-of-the-art model counters for CNF formulas; two natural encodings yield schemes that, however, are inferior to the more specialized schemes.

Prescriptive pricing is one of the most advanced pricing techniques, which derives the optimal price strategy to maximize the future profit/revenue by carrying out a two-stage process, demand modeling and price optimization.Demand modeling tries to reveal price-demand laws by discovering causal relationships among demands, prices, and objective factors, which is the foundation of price optimization.Existing methods either use regression or causal learning for uncovering the price-demand relations, but suffer from pain points in either accuracy/efficiency or mixed data type processing, while all of these are actual requirements in practical pricing scenarios.This paper proposes a novel demand modeling technique for practical usage.Speaking concretely, we propose a new locally consistent information criterion named MIC,and derive MIC-based inference algorithms for an accurate recovery of causal structure on mixed factor space.Experiments on simulate/real datasets show the superiority of our new approach in both price-demand law recovery and demand forecasting, as well as show promising performance in supporting optimal pricing.

In this paper, we provide an axiomatic justification for decision making with belief functions by studying the belief-function counterpart of Savage's Theorem where the state space is finite and the consequence set is a continuum [l, M] (l Keywords: Uncertainty in AI: Decision;Utility Theory Uncertainty in AI: Uncertainty in AI