| Total: 18

In order for conversational AI systems to hold more natural and broad-ranging conversations, they will require much more commonsense, including the ability to identify unstated presumptions of their conversational partners. For example, in the command "If it snows at night then wake me up early because I don't want to be late for work" the speaker relies on commonsense reasoning of the listener to infer the implicit presumption that they wish to be woken only if it snows enough to cause traffic slowdowns. We consider here the problem of understanding such imprecisely stated natural language commands given in the form of if-(state), then-(action), because-(goal) statements. More precisely, we consider the problem of identifying the unstated presumptions of the speaker that allow the requested action to achieve the desired goal from the given state (perhaps elaborated by making the implicit presumptions explicit). We release a benchmark data set for this task, collected from humans and annotated with commonsense presumptions. We present a neuro-symbolic theorem prover that extracts multi-hop reasoning chains, and apply it to this problem. Furthermore, to accommodate the reality that current AI commonsense systems lack full coverage, we also present an interactive conversational framework built on our neuro-symbolic system, that conversationally evokes commonsense knowledge from humans to complete its reasoning chains.

Despite the prevalence of deep neural networks, their single most cited drawback is that, even when successful, their operations are inscrutable. For many applications, the desired outputs are the composition of externally-defined bases. For such decomposable domains, we present a two-stage learning procedure producing combinations of the external bases which are trivially extractable from the network. In the first stage, the set of external bases that will form the solution are modeled as differentiable generator modules, controlled by the same parameters as the external bases. In the second stage, a controller network is created that selects parameters for those generators, either successively or in parallel, to compose the final solution. Through three tasks, we concretely demonstrate how our system yields readily understandable commands. In one, we introduce a new form of artistic style transfer, learning to draw and color with crayons, in which the transformation of a photograph or painting occurs not as a single monolithic computation, but by the composition of thousands of individual, visualizable strokes. The other two tasks, single-pass function approximation with arbitrary bases and shape-based synthesis, show how our approach produces understandable and extractable actions in two disparate domains.

Understanding narratives requires reasoning about implicit world knowledge related to the causes, effects, and states of situations described in text. At the core of this challenge is how to access contextually relevant knowledge on demand and reason over it. In this paper, we present initial studies toward zero-shot commonsense question answering by formulating the task as inference over dynamically generated commonsense knowledge graphs. In contrast to previous studies for knowledge integration that rely on retrieval of existing knowledge from static knowledge graphs, our study requires commonsense knowledge integration where contextually relevant knowledge is often not present in existing knowledge bases. Therefore, we present a novel approach that generates contextually-relevant symbolic knowledge structures on demand using generative neural commonsense knowledge models. Empirical results on two datasets demonstrate the efficacy of our neuro-symbolic approach for dynamically constructing knowledge graphs for reasoning. Our approach achieves significant performance boosts over pretrained language models and vanilla knowledge models, all while providing interpretable reasoning paths for its predictions.

Neural networks have been the key to solve a variety of different problems. However, neural network models are still regarded as black boxes, since they do not provide any human-interpretable evidence as to why they output a certain result. We address this issue by leveraging on ontologies and building small classifiers that map a neural network model's internal state to concepts from an ontology, enabling the generation of symbolic justifications for the output of neural network models. Using an image classification problem as testing ground, we discuss how to map the internal state of a neural network to the concepts of an ontology, examine whether the results obtained by the established mappings match our understanding of the mapped concepts, and analyze the justifications obtained through this method.

Width-based planning methods have been shown to yield state-of-the-art performance in the Atari 2600 domain using pixel input. One successful approach, RolloutIW, represents states with the B-PROST boolean feature set. An augmented version of RolloutIW, pi-IW, shows that learned features can be competitive with handcrafted ones for width-based search. In this paper, we leverage variational autoencoders (VAEs) to learn features directly from pixels in a principled manner, and without supervision. The inference model of the trained VAEs extracts boolean features from pixels, and RolloutIW plans with these features. The resulting combination outperforms the original RolloutIW and human professional play on Atari 2600 and drastically reduces the size of the feature set.

For prediction of interacting agents' trajectories, we propose an end-to-end trainable architecture that hybridizes neural nets with game-theoretic reasoning, has interpretable intermediate representations, and transfers to downstream decision making. It uses a net that reveals preferences from the agents' past joint trajectory, and a differentiable implicit layer that maps these preferences to local Nash equilibria, forming the modes of the predicted future trajectory. Additionally, it learns an equilibrium refinement concept. For tractability, we introduce a new class of continuous potential games and an equilibrium-separating partition of the action space. We provide theoretical results for explicit gradients and soundness. In experiments, we evaluate our approach on two real-world data sets, where we predict highway drivers' merging trajectories, and on a simple decision-making transfer task.

Previous neural solvers of math word problems (MWPs) are learned with full supervision and fail to generate diverse solutions. In this paper, we address this issue by introducing a weakly-supervised paradigm for learning MWPs. Our method only requires the annotations of the final answers and can generate various solutions for a single problem. To boost weakly-supervised learning, we propose a novel learning-by-fixing (LBF) framework, which corrects the misperceptions of the neural network via symbolic reasoning. Specifically, for an incorrect solution tree generated by the neural network, the fixing mechanism propagates the error from the root node to the leaf nodes and infers the most probable fix that can be executed to get the desired answer. To generate more diverse solutions, tree regularization is applied to guide the efficient shrinkage and exploration of the solution space, and a memory buffer is designed to track and save the discovered various fixes for each problem. Experimental results on the Math23K dataset show the proposed LBF framework significantly outperforms reinforcement learning baselines in weakly-supervised learning. Furthermore, it achieves comparable top-1 and much better top-3/5 answer accuracies than fully-supervised methods, demonstrating its strength in producing diverse solutions.

Representation learning for knowledge graphs (KGs) has focused on the problem of answering simple link prediction queries. In this work we address the more ambitious challenge of predicting the answers of conjunctive queries with multiple missing entities. We propose Bidirectional Query Embedding (BiQE), a method that embeds conjunctive queries with models based on bi-directional attention mechanisms. Contrary to prior work, bidirectional self-attention can capture interactions among all the elements of a query graph. We introduce two new challenging datasets for studying conjunctive query inference and conduct experiments on several benchmark datasets that demonstrate BiQE significantly outperforms state of the art baselines.

Labeling training examples at scale is a perennial challenge in machine learning. Self-supervision methods compensate for the lack of direct supervision by leveraging prior knowledge to automatically generate noisy labeled examples. Deep probabilistic logic (DPL) is a unifying framework for self-supervised learning that represents unknown labels as latent variables and incorporates diverse self-supervision using probabilistic logic to train a deep neural network end-to-end using variational EM. While DPL is successful at combining pre-specified self-supervision, manually crafting self-supervision to attain high accuracy may still be tedious and challenging. In this paper, we propose Self-Supervised Self-Supervision (S4), which adds to DPL the capability to learn new self-supervision automatically. Starting from an initial "seed," S4 iteratively uses the deep neural network to propose new self-supervision. These are either added directly (a form of structured self-training) or verified by a human expert (as in feature-based active learning). Experiments show that S4 is able to automatically propose accurate self-supervision and can often nearly match the accuracy of supervised methods with a tiny fraction of the human effort.

Explaining the predictions of neural black-box models is an important problem, especially when such models are used in applications where user trust is crucial. Estimating the influence of training examples on a learned neural model's behavior allows us to identify training examples most responsible for a given prediction and, therefore, to faithfully explain the output of a black-box model. The most generally applicable existing method is based on influence functions, which scale poorly for larger sample sizes and models. We propose gradient rollback, a general approach for influence estimation, applicable to neural models where each parameter update step during gradient descent touches a smaller number of parameters, even if the overall number of parameters is large. Neural matrix factorization models trained with gradient descent are part of this model class. These models are popular and have found a wide range of applications in industry. Especially knowledge graph embedding methods, which belong to this class, are used extensively. We show that gradient rollback is highly efficient at both training and test time. Moreover, we show theoretically that the difference between gradient rollback's influence approximation and the true influence on a model's behavior is smaller than known bounds on the stability of stochastic gradient descent. This establishes that gradient rollback is robustly estimating example influence. We also conduct experiments which show that gradient rollback provides faithful explanations for knowledge base completion and recommender datasets. An implementation and an appendix are available.

Driven by deep neural networks (DNN), the recent development of computer vision makes vision sensors such as stereo cameras and Lidars ubiquitous in autonomous cars, robotics and traffic monitoring. However, a traditional DNN-based data fusion pipeline like object tracking has to hard-wire an engineered set of DNN models to a fixed processing logic, which makes it difficult to infuse new models to that pipeline. To overcome this, we propose a novel neural-symbolic stream reasoning approach realised by semantic stream reasoning programs which specify DNN-based data fusion pipelines via logic rules with learnable probabilistic degrees as weights. The reasoning task over this program is governed by a novel incremental reasoning algorithm, which lends itself also as a core building block for a scalable and parallel algorithm to learn the weights for such program. Extensive experiments with our first prototype on multi-object tracking benchmarks for autonomous driving and traffic monitoring show that our flexible approach can considerably improve both accuracy and processing throughput compared to the DNN-based counterparts.

Automated mathematical reasoning is a challenging problem that requires an agent to learn algebraic patterns that contain long-range dependencies. Two particular tasks that test this type of reasoning are (1)mathematical equation verification,which requires determining whether trigonometric and linear algebraic statements are valid identities or not, and (2)equation completion, which entails filling in a blank within an expression to make it true. Solving these tasks with deep learning requires that the neural model learn how to manipulate and compose various algebraic symbols, carrying this ability over to previously unseen expressions. Artificial neural net-works, including recurrent networks and transformers, struggle to generalize on these kinds of difficult compositional problems, often exhibiting poor extrapolation performance.In contrast, recursive neural networks (recursive-NNs) are,theoretically, capable of achieving better extrapolation due to their tree-like design but are very difficult to optimize as the depth of their underlying tree structure increases. To over-come this, we extend recursive-NNs to utilize multiplicative,higher-order synaptic connections and, furthermore, to learn to dynamically control and manipulate an external memory.We argue that this key modification gives the neural system the ability to capture powerful transition functions for each possible input. We demonstrate the effectiveness of our pro-posed higher-order, memory-augmented recursive-NN models on two challenging mathematical equation tasks, showing improved extrapolation, stable performance, and faster convergence. We show that our models achieve 1.53% average improvement over current state-of-the-art methods in equation verification and achieve 2.22% top-1 average accuracy and 2.96% top-5 average accuracy for equation completion.

Automated planning with neural network transition models is a two stage approach to solving planning problems with unknown transition models. The first stage of the approach learns the unknown transition model from data as a neural network model, and the second stage of the approach compiles the learned model to either a Mixed-Integer Linear Programming (MILP) model or a Recurrent Neural Network (RNN) model, and optimize it using an off-the-shelf solver. The previous studies have shown that both models have their advantages and disadvantages. Namely, the MILP model can be solved optimally using a branch-and-bound algorithm but has been experimentally shown not to scale well for neural networks with multiple hidden layers. In contrast, the RNN model can be solved effectively using a gradient descent algorithm but can only work under very restrictive assumptions. In this paper, we focus on improving the effectiveness of solving the second stage of the approach by introducing (i) a novel Lagrangian RNN architecture that can model the previously ignored components of the planning problem as Lagrangian functions, and (ii) a novel framework that unifies the MILP and the Lagrangian RNN models such that the weakness of one model is complemented by the strength of the other. Experimentally, we show that our unifying framework significantly outperforms the standalone MILP model by solving 80% more problem instances, and showcase the ability of our unifying framework to find high quality solutions to challenging automated planning problems with unknown transition models.

A major challenge in scene graph classification is that the appearance of objects and relations can be significantly different from one image to another. Previous works have addressed this by relational reasoning over all objects in an image or incorporating prior knowledge into classification. Unlike previous works, we do not consider separate models for perception and prior knowledge. Instead, we take a multi-task learning approach by introducing schema representations and implementing the classification as an attention layer between image-based representations and the schemata. This allows for the prior knowledge to emerge and propagate within the perception model. By enforcing the model also to represent the prior, we achieve a strong inductive bias. We show that our model can accurately generate commonsense knowledge and that the iterative injection of this knowledge to scene representations, as a top-down mechanism, leads to significantly higher classification performance. Additionally, our model can be fine-tuned on external knowledge given as triples. When combined with self-supervised learning and with 1% of annotated images only, this gives more than 3% improvement in object classification, 26% in scene graph classification, and 36% in predicate prediction accuracy.

The differentiable implementation of logic yields a seamless combination of symbolic reasoning and deep neural networks. Recent research, which has developed a differentiable framework to learn logic programs from examples, can even acquire reasonable solutions from noisy datasets. However, this framework severely limits expressions for solutions, e.g., no function symbols are allowed, and the shapes of clauses are fixed. As a result, the framework cannot deal with structured examples. Therefore we propose a new framework to learn logic programs from noisy and structured examples, including the following contributions. First, we propose an adaptive clause search method by looking through structured space, which is defined by the generality of the clauses, to yield an efficient search space for differentiable solvers. Second, we propose for ground atoms an enumeration algorithm, which determines a necessary and sufficient set of ground atoms to perform differentiable inference functions. Finally, we propose a new method to compose logic programs softly, enabling the system to deal with complex programs consisting of several clauses. Our experiments show that our new framework can learn logic programs from noisy and structured examples, such as sequences or trees. Our framework can be scaled to deal with complex programs that consist of several clauses with function symbols.

Deep reinforcement learning has been successful in a variety of tasks, such as game playing and robotic manipulation. However, attempting to learn tabula rasa disregards the logical structure of many domains as well as the wealth of readily available knowledge from domain experts that could help "warm start" the learning process. We present a novel reinforcement learning technique that allows for intelligent initialization of a neural network weights and architecture. Our approach permits the encoding domain knowledge directly into a neural decision tree, and improves upon that knowledge with policy gradient updates. We empirically validate our approach on two OpenAI Gym tasks and two modified StarCraft 2 tasks, showing that our novel architecture outperforms multilayer-perceptron and recurrent architectures. Our knowledge-based framework finds superior policies compared to imitation learning-based and prior knowledge-based approaches. Importantly, we demonstrate that our approach can be used by untrained humans to initially provide >80% increase in expected reward relative to baselines prior to training (p < 0.001), which results in a >60% increase in expected reward after policy optimization (p = 0.011).

Despite significant progress in the development of neural-symbolic frameworks, the question of how to integrate a neural and a symbolic system in a compositional manner remains open. Our work seeks to fill this gap by treating these two systems as black boxes to be integrated as modules into a single architecture, without making assumptions on their internal structure and semantics. Instead, we expect only that each module exposes certain methods for accessing the functions that the module implements: the symbolic module exposes a deduction method for computing the function's output on a given input, and an abduction method for computing the function's inputs for a given output; the neural module exposes a deduction method for computing the function's output on a given input, and an induction method for updating the function given input-output training instances. We are, then, able to show that a symbolic module --- with any choice for syntax and semantics, as long as the deduction and abduction methods are exposed --- can be cleanly integrated with a neural module, and facilitate the latter's efficient training, achieving empirical performance that exceeds that of previous work.

Machine teaching is an algorithmic framework for teaching a target hypothesis via a sequence of examples or demonstrations. We investigate machine teaching for temporal logic formulas—a novel and expressive hypothesis class amenable to time-related task specifications. In the context of teaching temporal logic formulas, an exhaustive search even for a myopic solution takes exponential time (with respect to the time span of the task). We propose an efficient approach for teaching parametric linear temporal logic formulas. Concretely, we derive a necessary condition for the minimal time length of a demonstration to eliminate a set of hypotheses. Utilizing this condition, we propose an efficient myopic teaching algorithm by solving a sequence of integer programming problems. We further show that, under two notions of teaching complexity, the proposed algorithm has near-optimal performance. We evaluate our algorithm extensively under different classes of learners (i.e., learners with different preferences over hypotheses) and interaction protocols (e.g., non-adaptive and adaptive). Our results demonstrate the effectiveness of the proposed algorithm in teaching temporal logic formulas; in particular, we show that there are significant gains of teaching efficacy when the teacher adapts to feedback of the learner, or adapts to a (non-myopic) oracle.