| Total: 11
Recent findings suggest that humans deploy cognitive mechanism of physics simulation engines to simulate the physics of objects. We propose a framework for bots to deploy probabilistic programming tools for interacting with intuitive physics environments. The framework employs a physics simulation in a probabilistic way to infer about moves performed by an agent in a setting governed by Newtonian laws of motion. However, methods of probabilistic programs can be slow in such setting due to their need to generate many samples. We complement the model with a model-free approach to aid the sampling procedures in becoming more efficient through learning from experience during game playing. We present an approach where combining model-free approaches (a convolutional neural network in our model) and model-based approaches (probabilistic physics simulation) is able to achieve what neither could alone. This way the model outperforms an all model-free or all model-based approach. We discuss a case study showing empirical results of the performance of the model on the game of Flappy Bird.
In decision making tasks under uncertainty, humans display characteristic biases in seeking, integrating, and acting upon information relevant to the task. Here, we reexamine data from previous carefully designed experiments, collected at scale, that measured and catalogued these biases in aggregate form. We design deep learning models that replicate these biases in aggregate, while also capturing individual variation in behavior. A key finding of our work is that paucity of data collected from each individual subject can be overcome by sampling large numbers of subjects from the population, while still capturing individual differences. We predict human behavior with high accuracy without making any assumptions about task goals, reward structure, or individual biases, thus providing a model-agnostic fit to human behavior in the task. Such an approach can sidestep potential limitations in modeler-specified inductive biases, and has implications for computational modeling of human cognitive function in general, and of human-AI interfaces in particular.
In this paper, we propose a normative approach to modeling apparently human irrational decision making (cognitive biases) that makes use of inherently rational computational mechanisms. We view preferential choice tasks as sequential decision making problems and formulate them as Partially Observable Markov Decision Processes (POMDPs). The resulting sequential decision model learns what information to gather about which options, whether to calculate option values or make comparisons between options and when to make a choice. We apply the model to choice problems where context is known to influence human choice, an effect that has been taken as evidence that human cognition is irrational. Our results show that the new model approximates a bounded optimal cognitive policy and makes quantitative predictions that correspond well to evidence about human choice. Furthermore, the model uses context to help infer which option has a maximum expected value while taking into account computational cost and cognitive limits. In addition, it predicts when, and explains why, people stop evidence accumulation and make a decision. We argue that the model provides evidence that apparent human irrationalities are emergent consequences of processes that prefer higher value (rational) policies.
Visual Relation Detection is currently one of the most popular problems for visual understanding. Many deep-learning models are designed for relation detection on images and have achieved impressive results. However, deep-learning models have several serious problems, including poor training-efficiency and lack of understandability. Psychologists have ample evidence that analogy is central in human learning and reasoning, including visual reasoning. This paper introduces a new hybrid system for visual relation detection combining deep-learning models and analogical generalization. Object bounding boxes and masks are detected using deep-learning models and analogical generalization over qualitative representations is used for visual relation detection between object pairs. Experiments on the Visual Relation Detection dataset indicates that our hybrid system gets comparable results on the task and is more training-efficient and explainable than pure deep-learning models.
Analogy is core to human cognition. It allows us to solve problems based on prior experience, it governs the way we conceptualize new information, and it even influences our visual perception. The importance of analogy to humans has made it an active area of research in the broader field of artificial intelligence, resulting in data-efficient models that learn and reason in human-like ways. While cognitive perspectives of analogy and deep learning have generally been studied independently of one another, the integration of the two lines of research is a promising step towards more robust and efficient learning techniques. As part of a growing body of research on such an integration, we introduce the Analogical Matching Network: a neural architecture that learns to produce analogies between structured, symbolic representations that are largely consistent with the principles of Structure-Mapping Theory.
Human behavior is the confluence of output from voluntary and involuntary motor systems. The neural activities that mediate behavior, from individual cells to distributed networks, are in a state of constant flux. Artificial intelligence (AI) research over the past decade shows that behavior, in the form of facial muscle activity, can reveal information about fleeting voluntary and involuntary motor system activity related to emotion, pain, and deception. However, the AI algorithms often lack an explanation for their decisions, and learning meaningful representations requires large datasets labeled by a subject-matter expert. Motivated by the success of using facial muscle movements to classify brain states and the importance of learning from small amounts of data, we propose an explainable self-supervised representation-learning paradigm that learns meaningful temporal facial muscle movement patterns from limited samples. We validate our methodology by carrying out comprehensive empirical study to predict future speech behavior in a real-world dataset of adults who stutter (AWS). Our explainability study found facial muscle movements around the eyes (p<0.001) and lips (p<0.001) differ significantly before producing fluent vs. disfluent speech. Evaluations using the AWS dataset demonstrates that the proposed self-supervised approach achieves a minimum of 2.51% accuracy improvement over fully-supervised approaches.
Video sentiment analysis as a decision-making process is inherently complex, involving the fusion of decisions from multiple modalities and the so-caused cognitive biases. Inspired by recent advances in quantum cognition, we show that the sentiment judgment from one modality could be incompatible with the judgment from another, i.e., the order matters and they cannot be jointly measured to produce a final decision. Thus the cognitive process exhibits ``quantum-like'' biases that cannot be captured by classical probability theories. Accordingly, we propose a fundamentally new, quantum cognitively motivated fusion strategy for predicting sentiment judgments. In particular, we formulate utterances as quantum superposition states of positive and negative sentiment judgments, and uni-modal classifiers as mutually incompatible observables, on a complex-valued Hilbert space with positive-operator valued measures. Experiments on two benchmarking datasets illustrate that our model significantly outperforms various existing decision level and a range of state-of-the-art content-level fusion approaches. The results also show that the concept of incompatibility allows effective handling of all combination patterns, including those extreme cases that are wrongly predicted by all uni-modal classifiers.
We address the black-box issue of VR sickness assessment (VRSA) by evaluating the level of physical symptoms of VR sickness. For the VR contents inducing the similar VR sickness level, the physical symptoms can vary depending on the characteristics of the contents. Most of existing VRSA methods focused on assessing the overall VR sickness score. To make better understanding of VR sickness, it is required to predict and provide the level of major symptoms of VR sickness rather than overall degree of VR sickness. In this paper, we predict the degrees of main physical symptoms affecting the overall degree of VR sickness, which are disorientation, nausea, and oculomotor. In addition, we introduce a new large-scale dataset for VRSA including 360 videos with various frame rates, physiological signals, and subjective scores. On VRSA benchmark and our newly collected dataset, our approach shows a potential to not only achieve the highest correlation with subjective scores, but also to better understand which symptoms are the main causes of VR sickness.
The ability to perceive and reason about social interactions in the context of physical environments is core to human social intelligence and human-machine cooperation. However, no prior dataset or benchmark has systematically evaluated physically grounded perception of complex social interactions that go beyond short actions, such as high-fiving, or simple group activities, such as gathering. In this work, we create a dataset of physically-grounded abstract social events, PHASE, that resemble a wide range of real-life social interactions by including social concepts such as helping another agent. PHASE consists of 2D animations of pairs of agents moving in a continuous space generated procedurally using a physics engine and a hierarchical planner. Agents have a limited field of view, and can interact with multiple objects, in an environment that has multiple landmarks and obstacles. Using PHASE, we design a social recognition task and a social prediction task. PHASE is validated with human experiments demonstrating that humans perceive rich interactions in the social events, and that the simulated agents behave similarly to humans. As a baseline model, we introduce a Bayesian inverse planning approach, SIMPLE (SIMulation, Planning and Local Estimation), which outperforms state-of-the-art feed-forward neural networks. We hope that PHASE can serve as a difficult new challenge for developing new models that can recognize complex social interactions.
Modeling non-linear data as symmetric positive definite (SPD) matrices on Riemannian manifolds has attracted much attention for various classification tasks. In the context of deep learning, SPD matrix-based Riemannian networks have been shown to be a promising solution for classifying electroencephalogram (EEG) signals, capturing the Riemannian geometry within their structured 2D feature representation. However, existing approaches usually learn spatial-temporal structures in an embedding space for all available EEG signals, and their optimization procedures rely on computationally expensive iterations. Furthermore, these approaches often struggle to encode all of the various types of relationships into a single distance metric, resulting in a loss of generality. To address the above limitations, we propose a Riemannian Embedding Banks method, which divides the problem of common spatial patterns learning in an entire embedding space into K-subproblems and builds one model for each subproblem, to be combined with SPD neural networks. By leveraging the concept of the "separate to learn" technology on a Riemannian manifold, REB divides the data and the embedding space into K non-overlapping subsets and learns K separate distance metrics in a Riemannian geometric space instead of the vector space. Then, the learned K non-overlapping subsets are grouped into neurons in the SPD neural network's embedding layer. Experimental results on public EEG datasets demonstrate the superiority of the proposed approach for learning common spatial patterns of EEG signals despite their non-stationary nature, increasing the convergence speed while maintaining generalization.
Human emotion decoding in affective brain-computer interfaces suffers a major setback due to the inter-subject variability of electroencephalography (EEG) signals. Existing approaches usually require amassing extensive EEG data of each new subject, which is prohibitively time-consuming along with poor user experience. To tackle this issue, we divide EEG representations into private components specific to each subject and shared emotional components that are universal to all subjects. According to this representation partition, we propose a plug-and-play domain adaptation method for dealing with the inter-subject variability. In the training phase, subject-invariant emotional representations and private components of source subjects are separately captured by a shared encoder and private encoders. Furthermore, we build one emotion classifier on the shared partition and subjects' individual classifiers on the combination of these two partitions. In the calibration phase, the model only requires few unlabeled EEG data from incoming target subjects to model their private components. Therefore, besides the shared emotion classifier, we have another pipeline to use the knowledge of source subjects through the similarity of private components. In the test phase, we integrate predictions of the shared emotion classifier with those of individual classifiers ensemble after modulation by similarity weights. Experimental results on the SEED dataset show that our model greatly shortens the calibration time within a minute while maintaining the recognition accuracy, all of which make emotion decoding more generalizable and practicable.