IJCAI.2018 - Humans and AI | Cool Papers - Immersive Paper Discovery

#1 On the Cost Complexity of Crowdsourcing [PDF] [Copy] [Kimi] [REL]

Authors: Yili Fang ; Hailong Sun ; Pengpeng Chen ; Jinpeng Huai

Existing efforts mainly use empirical analysis to evaluate the effectiveness of crowdsourcing methods, which is often unreliable across experimental settings. Consequently, it is of great importance to study theoretical methods. This work, for the first time, defines the cost complexity of crowdsourcing, and presents two theorems to compute the cost complexity. Our theorems provide a general theoretical method to model the trade-off between costs and quality, which can be used to evaluate and design crowdsourcing algorithms, and characterize the complexity of crowdsourcing problems. Moreover, following our theorems, we prove a set of corollaries that can obtain existing theoretical results for special cases. We have verified our work theoretically and empirically.

#2 A Novel Strategy for Active Task Assignment in Crowd Labeling [PDF] [Copy] [Kimi] [REL]

Authors: Zehong Hu ; Jie Zhang

Active learning strategies are often used in crowd labeling to improve task assignment. However, these strategies require prohibitive computation time yet still cannot improve the assignment to the utmost, because they simply evaluate each possible assignment and then greedily select the optimal one. In this paper, we first derive an efficient algorithm for assignment evaluation. Then, to overcome the uncertainty of labels, we develop a novel strategy that modulates the scope of the greedy task assignment with posterior uncertainty and keeps the evaluation optimistic. The experiments on two popular worker models and four MTurk datasets show that our strategy achieves the best performance and highest computation efficiency.

#3 Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances [PDF] [Copy] [Kimi] [REL]

Authors: Thao Le Minh ; Nobuyuki Shimizu ; Takashi Miyazaki ; Koichi Shinoda

With the widespread use of intelligent systems, such as smart speakers, addressee recognition has become a concern in human-computer interaction, as more and more people expect such systems to understand complicated social scenes, including those outdoors, in cafeterias, and hospitals. Because previous studies typically focused only on pre-specified tasks with limited conversational situations such as controlling smart homes, we created a mock dataset called Addressee Recognition in Visual Scenes with Utterances (ARVSU) that contains a vast body of image variations in visual scenes with an annotated utterance and a corresponding addressee for each scenario. We also propose a multi-modal deep-learning-based model that takes different human cues, specifically eye gazes and transcripts of an utterance corpus, into account to predict the conversational addressee from a specific speaker's view in various real-life conversational scenarios. To the best of our knowledge, we are the first to introduce an end-to-end deep learning model that combines vision and transcripts of utterance for addressee recognition. As a result, our study suggests that future addressee recognition can reach the ability to understand human intention in many social situations previously unexplored, and our modality dataset is a first step in promoting research in this field.

#4 Simultaneous Clustering and Ranking from Pairwise Comparisons [PDF] [Copy] [Kimi] [REL]

Authors: Jiyi Li ; Yukino Baba ; Hisashi Kashima

When people make decisions with a number of ideas, designs, or other kinds of objects, one attempt is probably to organize them into several groups of objects and to prioritize them according to some preference. The grouping task is referred to as clustering and the prioritizing task is called as ranking. These tasks are often outsourced with the help of human judgments in the form of pairwise comparisons. Two objects are compared on whether they are similar in the clustering problem, while the object of higher priority is determined in the ranking problem. Our research question in this paper is whether the pairwise comparisons for clustering also help ranking (and vice versa). Instead of solving the two tasks separately, we propose a unified formulation to bridge the two types of pairwise comparisons. Our formulation simultaneously estimates the object embeddings and the preference criterion vector. The experiments using real datasets support our hypothesis; our approach can generate better neighbor and preference estimation results than the approaches that only focus on a single type of pairwise comparisons.

#5 A Novel Neural Network Model based on Cerebral Hemispheric Asymmetry for EEG Emotion Recognition [PDF] [Copy] [Kimi] [REL]

Authors: Yang Li ; Wenming Zheng ; Zhen Cui ; Tong Zhang ; Yuan Zong

In this paper, we propose a novel neural network model, called bi-hemispheres domain adversarial neural network (BiDANN), for EEG emotion recognition. BiDANN is motivated by the neuroscience findings, i.e., the emotional brain's asymmetries between left and right hemispheres. The basic idea of BiDANN is to map the EEG feature data of both left and right hemispheres into discriminative feature spaces separately, in which the data representations can be classified easily. For further precisely predicting the class labels of testing data, we narrow the distribution shift between training and testing data by using a global and two local domain discriminators, which work adversarially to the classifier to encourage domain-invariant data representations to emerge. After that, the learned classifier from labeled training data can be applied to unlabeled testing data naturally. We conduct two experiments to verify the performance of our BiDANN model on SEED database. The experimental results show that the proposed model achieves the state-of-the-art performance.

#6 On the Efficiency of Data Collection for Crowdsourced Classification [PDF] [Copy] [Kimi] [REL]

Authors: Edoardo Manino ; Long Tran-Thanh ; Nicholas R. Jennings

The quality of crowdsourced data is often highly variable. For this reason, it is common to collect redundant data and use statistical methods to aggregate it. Empirical studies show that the policies we use to collect such data have a strong impact on the accuracy of the system. However, there is little theoretical understanding of this phenomenon. In this paper we provide the first theoretical explanation of the accuracy gap between the most popular collection policies: the non-adaptive uniform allocation, and the adaptive uncertainty sampling and information gain maximisation. To do so, we propose a novel representation of the collection process in terms of random walks. Then, we use this tool to derive lower and upper bounds on the accuracy of the policies. With these bounds, we are able to quantify the advantage that the two adaptive policies have over the non-adaptive one for the first time.

#7 Similarity-Based Reasoning, Raven's Matrices, and General Intelligence [PDF] [Copy] [Kimi] [REL]

Authors: Can Serif Mekik ; Ron Sun ; David Yun Dai

This paper presents a model tackling a variant of the Raven's Matrices family of human intelligence tests along with computational experiments. Raven's Matrices are thought to challenge human subjects' ability to generalize knowledge and deal with novel situations. We investigate how a generic ability to quickly and accurately generalize knowledge can be succinctly captured by a computational system. This work is distinct from other prominent attempts to deal with the task in terms of adopting a generalized similarity-based approach. Raven's Matrices appear to primarily require similarity-based or analogical reasoning over a set of varied visual stimuli. The similarity-based approach eliminates the need for structure mapping as emphasized in many existing analogical reasoning systems. Instead, it relies on feature-based processing with both relational and non-relational features. Preliminary experimental results suggest that our approach performs comparably to existing symbolic analogy-based models.

#8 NPE: Neural Personalized Embedding for Collaborative Filtering [PDF] [Copy] [Kimi] [REL]

Authors: ThaiBinh Nguyen ; Atsuhiro Takasu

Matrix factorization is one of the most efficient approaches in recommender systems. However, such algorithms, which rely on the interactions between users and items, perform poorly for "cold-users" (users with little history of such interactions) and at capturing the relationships between closely related items. To address these problems, we propose a neural personalized embedding (NPE) model, which improves the recommendation performance for cold-users and can learn effective representations of items. It models a user's click to an item in two terms: the personal preference of the user for the item, and the relationships between this item and other items clicked by the user. We show that NPE outperforms competing methods for top-N recommendations, specially for cold-user recommendations. We also performed a qualitative analysis that shows the effectiveness of the representations learned by the model.

#9 Algorithms for Fair Load Shedding in Developing Countries [PDF] [Copy] [Kimi] [REL]

Authors: Olabambo I. Oluwasuji ; Obaid Malik ; Jie Zhang ; Sarvapali D. Ramchurn

Due to the limited generation capacity of power stations, many developing countries frequently resort to disconnecting large parts of the power grid from supply, a process termed load shedding. During load shedding, many homes are left without electricity, causing them inconvenience and discomfort. In this paper, we present a number of optimization heuristics that focus on pairwise and groupwise fairness, such that households (i.e. agents) are fairly allocated electricity. We evaluate the heuristics against standard fairness metrics in terms of comfort delivered to homes, as well as the number of times they are disconnected from electricity supply. Thus, we establish new benchmarks for fair load shedding schemes.

#10 Jointly Learning Network Connections and Link Weights in Spiking Neural Networks [PDF] [Copy] [Kimi] [REL]

Authors: Yu Qi ; Jiangrong Shen ; Yueming Wang ; Huajin Tang ; Hang Yu ; Zhaohui Wu ; Gang Pan

Spiking neural networks (SNNs) are considered to be biologically plausible and power-efficient on neuromorphic hardware. However, unlike the brain mechanisms, most existing SNN algorithms have fixed network topologies and connection relationships. This paper proposes a method to jointly learn network connections and link weights simultaneously. The connection structures are optimized by the spike-timing-dependent plasticity (STDP) rule with timing information, and the link weights are optimized by a supervised algorithm. The connection structures and the weights are learned alternately until a termination condition is satisfied. Experiments are carried out using four benchmark datasets. Our approach outperforms classical learning methods such as STDP, Tempotron, SpikeProp, and a state-of-the-art supervised algorithm. In addition, the learned structures effectively reduce the number of connections by about 24%, thus facilitate the computational efficiency of the network.

#11 A Simple Convolutional Neural Network for Accurate P300 Detection and Character Spelling in Brain Computer Interface [PDF] [Copy] [Kimi] [REL]

Authors: Hongchang Shan ; Yu Liu ; Todor Stefanov

A Brain Computer Interface (BCI) character speller allows human-beings to directly spell characters using eye-gazes, thereby building communication between the human brain and a computer. Convolutional Neural Networks (CNNs) have shown better performance than traditional machine learning methods for BCI signal recognition and its application to the character speller. However, current CNN architectures limit further accuracy improvements of signal detection and character spelling and also need high complexity to achieve competitive accuracy, thereby preventing the use of CNNs in portable BCIs. To address these issues, we propose a novel and simple CNN which effectively learns feature representations from both raw temporal information and raw spatial information. The complexity of the proposed CNN is significantly reduced compared with state-of-the-art CNNs for BCI signal detection. We perform experiments on three benchmark datasets and compare our results with those in previous research works which report the best results. The comparison shows that our proposed CNN can increase the signal detection accuracy by up to 15.61% and the character spelling accuracy by up to 19.35%.

#12 Cross-Domain Depression Detection via Harvesting Social Media [PDF] [Copy] [Kimi] [REL]

Authors: Tiancheng Shen ; Jia Jia ; Guangyao Shen ; Fuli Feng ; Xiangnan He ; Huanbo Luan ; Jie Tang ; Thanassis Tiropanis ; Tat-Seng Chua ; Wendy Hall

Depression detection is a significant issue for human well-being. In previous studies, online detection has proven effective in Twitter, enabling proactive care for depressed users. Owing to cultural differences, replicating the method to other social media platforms, such as Chinese Weibo, however, might lead to poor performance because of insufficient available labeled (self-reported depression) data for model training. In this paper, we study an interesting but challenging problem of enhancing detection in a certain target domain (e.g. Weibo) with ample Twitter data as the source domain. We first systematically analyze the depression-related feature patterns across domains and summarize two major detection challenges, namely isomerism and divergency. We further propose a cross-domain Deep Neural Network model with Feature Adaptive Transformation & Combination strategy (DNN-FATC) that transfers the relevant information across heterogeneous domains. Experiments demonstrate improved performance compared to existing heterogeneous transfer methods or training directly in the target domain (over 3.4% improvement in F1), indicating the potential of our model to enable depression detection via social media for more countries with different cultural settings.

#13 Synthesizing Pattern Programs from Examples [PDF] [Copy] [Kimi] [REL]

Authors: Sunbeom So ; Hakjoo Oh

We describe a programming-by-example system that automatically generates pattern programs from examples. Writing pattern programs, which produce various patterns of characters, is one of the most popular programming exercises for entry-level students. However, students often find it difficult to write correct solutions by themselves. In this paper, we present a method for synthesizing pattern programs from examples, allowing students to improve their programming skills efficiently. To that end, we first design a domain-specific language that supports a large class of pattern programs that students struggle with. Next, we develop a synthesis algorithm that efficiently finds a desired program by combining enumerative search, constraint solving, and program analysis. We implemented the algorithm in a tool and evaluated it on 40 exercises gathered from online forums. The experimental results and user study show that our tool can synthesize instructive solutions from 1–3 example patterns in 1.2 seconds on average.

#14 Learning Sequential Correlation for User Generated Textual Content Popularity Prediction [PDF] [Copy] [Kimi] [REL]

Authors: Wen Wang ; Wei Zhang ; Jun Wang ; Junchi Yan ; Hongyuan Zha

Popularity prediction of user generated textual content is critical for prioritizing information in the web, which alleviates heavy information overload for ordinary readers. Most previous studies model each content instance separately for prediction and thus overlook the sequential correlations between instances of a specific user. In this paper, we go deeper into this problem based on the two observations for each user, i.e., sequential content correlation and sequential popularity correlation. We propose a novel deep sequential model called User Memory-augmented recurrent Attention Network (UMAN). This model encodes the two correlations by updating external user memories which is further leveraged for target text representation learning and popularity prediction. The experimental results on several real-world datasets validate the benefits of considering these correlations and demonstrate UMAN achieves best performance among several strong competitors.

#15 Neural Framework for Joint Evolution Modeling of User Feedback and Social Links in Dynamic Social Networks [PDF] [Copy] [Kimi] [REL]

Authors: Peizhi Wu ; Yi Tu ; Xiaojie Yuan ; Adam Jatowt ; Zhenglu Yang

Modeling the evolution of user feedback and social links in dynamic social networks is of considerable significance, because it is the basis of many applications, including recommendation systems and user behavior analyses. Most of the existing methods in this area model user behaviors separately and consider only certain aspects of this problem, such as dynamic preferences of users, dynamic attributes of items, evolutions of social networks, and their partial integration. This work proposes a comprehensive general neural framework with several optimal strategies to jointly model the evolution of user feedback and social links. The framework considers the dynamic user preferences, dynamic item attributes, and time-dependent social links in time evolving social networks. Experimental results conducted on two real-world datasets demonstrate that our proposed model performs remarkably better than state-of-the-art methods.

#16 Memory Attention Networks for Skeleton-based Action Recognition [PDF] [Copy] [Kimi] [REL]

Authors: Chunyu Xie ; Ce Li ; Baochang Zhang ; Chen Chen ; Jungong Han ; Jianzhuang Liu

Skeleton-based action recognition task is entangled with complex spatio-temporal variations of skeleton joints, and remains challenging for Recurrent Neural Networks (RNNs). In this work, we propose a temporal-then-spatial recalibration scheme to alleviate such complex variations, resulting in an end-to-end Memory Attention Networks (MANs) which consist of a Temporal Attention Recalibration Module (TARM) and a Spatio-Temporal Convolution Module (STCM). Specifically, the TARM is deployed in a residual learning module that employs a novel attention learning network to recalibrate the temporal attention of frames in a skeleton sequence. The STCM treats the attention calibrated skeleton joint sequences as images and leverages the Convolution Neural Networks (CNNs) to further model the spatial and temporal information of skeleton data. These two modules (TARM and STCM) seamlessly form a single network architecture that can be trained in an end-to-end fashion. MANs significantly boost the performance of skeleton-based action recognition and achieve the best results on four challenging benchmark datasets: NTU RGB+D, HDM05, SYSU-3D and UT-Kinect.

#17 CSNN: An Augmented Spiking based Framework with Perceptron-Inception [PDF] [Copy] [Kimi] [REL]

Authors: Qi Xu ; Yu Qi ; Hang Yu ; Jiangrong Shen ; Huajin Tang ; Gang Pan

Spiking Neural Networks (SNNs) represent and transmit information in spikes, which is considered more biologically realistic and computationally powerful than the traditional Artificial Neural Networks. The spiking neurons encode useful temporal information and possess highly anti-noise property. The feature extraction ability of typical SNNs is limited by shallow structures. This paper focuses on improving the feature extraction ability of SNNs in virtue of powerful feature extraction ability of Convolutional Neural Networks (CNNs). CNNs can extract abstract features resorting to the structure of the convolutional feature maps. We propose a CNN-SNN (CSNN) model to combine feature learning ability of CNNs with cognition ability of SNNs. The CSNN model learns the encoded spatial temporal representations of images in an event-driven way. We evaluate the CSNN model on the handwritten digits images dataset MNIST and its variational databases. In the presented experimental results, the proposed CSNN model is evaluated regarding learning capabilities, encoding mechanisms, robustness to noisy stimuli and its classification performance. The results show that CSNN behaves well compared to other cognitive models with significantly fewer neurons and training samples. Our work brings more biological realism into modern image classification models, with the hope that these models can inform how the brain performs this high-level vision task.

#18 Brain-inspired Balanced Tuning for Spiking Neural Networks [PDF] [Copy] [Kimi] [REL]

Authors: Tielin Zhang ; Yi Zeng ; Dongcheng Zhao ; Bo Xu

Due to the nature of Spiking Neural Networks (SNNs), it is challenging to be trained by biologically plausible learning principles. The multi-layered SNNs are with non-differential neurons, temporary-centric synapses, which make them nearly impossible to be directly tuned by back propagation. Here we propose an alternative biological inspired balanced tuning approach to train SNNs. The approach contains three main inspirations from the brain: Firstly, the biological network will usually be trained towards the state where the temporal update of variables are equilibrium (e.g. membrane potential); Secondly, specific proportions of excitatory and inhibitory neurons usually contribute to stable representations; Thirdly, the short-term plasticity (STP) is a general principle to keep the input and output of synapses balanced towards a better learning convergence. With these inspirations, we train SNNs with three steps: Firstly, the SNN model is trained with three brain-inspired principles; then weakly supervised learning is used to tune the membrane potential in the final layer for network classification; finally the learned information is consolidated from membrane potential into the weights of synapses by Spike-Timing Dependent Plasticity (STDP). The proposed approach is verified on the MNIST hand-written digit recognition dataset and the performance (the accuracy of 98.64%) indicates that the ideas of balancing state could indeed improve the learning ability of SNNs, which shows the power of proposed brain-inspired approach on the tuning of biological plausible SNNs.

#19 Personality-Aware Personalized Emotion Recognition from Physiological Signals [PDF] [Copy] [Kimi] [REL]

Authors: Sicheng Zhao ; Guiguang Ding ; Jungong Han ; Yue Gao

Emotion recognition methodologies from physiological signals are increasingly becoming personalized, due to the subjective responses of different subjects to physical stimuli. Existing works mainly focused on modelling the involved physiological corpus of each subject, without considering the psychological factors. The latent correlation among different subjects has also been rarely examined. We propose to investigate the influence of personality on emotional behavior in a hypergraph learning framework. Assuming that each vertex is a compound tuple (subject, stimuli), multi-modal hypergraphs can be constructed based on the personality correlation among different subjects and on the physiological correlation among corresponding stimuli. To reveal the different importance of vertices, hyperedges, and modalities, we assign each of them with weights. The emotion relevance learned on the vertex-weighted multi-modal multi-task hypergraphs is employed for emotion recognition. We carry out extensive experiments on the ASCERTAIN dataset and the results demonstrate the superiority of the proposed method.