AAAI.2017 - NLP and Text Mining | Cool Papers

#1 Community-Based Question Answering via Asymmetric Multi-Faceted Ranking Network Learning [PDF] [Copy] [Kimi]

Authors: Zhou Zhao ; Hanqing Lu ; Vincent Zheng ; Deng Cai ; Xiaofei He ; Yueting Zhuang

Nowadays the community-based question answering (CQA) sites become the popular Internet-based web service, which have accumulated millions of questions and their posted answers over time. Thus, question answering becomes an essential problem in CQA sites, which ranks the high-quality answers to the given question. Currently, most of the existing works study the problem of question answering based on the deep semantic matching model to rank the answers based on their semantic relevance, while ignoring the authority of answerers to the given question. In this paper, we consider the problem of community-based question answering from the viewpoint of asymmetric multi-faceted ranking network embedding. We propose a novel asymmetric multi-faceted ranking network learning framework for community-based question answering by jointly exploiting the deep semantic relevance between question-answer pairs and the answerers' authority to the given question. We then develop an asymmetric ranking network learning method with deep recurrent neural networks by integrating both answers' relative quality rank to the given question and the answerers' following relations in CQA sites. The extensive experiments on a large-scale dataset from a real world CQA site show that our method achieves better performance than other state-of-the-art solutions to the problem.

#2 Structural Correspondence Learning for Cross-Lingual Sentiment Classification with One-to-Many Mappings [PDF] [Copy] [Kimi]

Authors: Nana Li ; Shuangfei Zhai ; Zhongfei Zhang ; Boying Liu

Structural correspondence learning (SCL) is an effective method for cross-lingual sentiment classification. This approach uses unlabeled documents along with a word translation oracle to automatically induce task specific, cross-lingual correspondences. It transfers knowledge through identifying important features, i.e., pivot features. For simplicity, however, it assumes that the word translation oracle maps each pivot feature in source language to exactly only one word in target language. This one-to-one mapping between words in different languages is too strict. Also the context is not considered at all. In this paper, we propose a cross-lingual SCL based on distributed representation of words; it can learn meaningful one-to-many mappings for pivot words using large amounts of monolingual data and a small dictionary. We conduct experiments on NLP&CC 2013 cross-lingual sentiment analysis dataset, employing English as source language, and Chinese as target language. Our method does not rely on the parallel corpora and the experimental results show that our approach is more competitive than the state-of-the-art methods in cross-lingual sentiment classification.

#3 What Happens Next? Future Subevent Prediction Using Contextual Hierarchical LSTM [PDF] [Copy] [Kimi]

Authors: Linmei Hu ; Juanzi Li ; Liqiang Nie ; Xiao-Li Li ; Chao Shao

Events are typically composed of a sequence of subevents. Predicting a future subevent of an event is of great importance for many real-world applications. Most previous work on event prediction relied on hand-crafted features and can only predict events that already exist in the training data. In this paper, we develop an end-to-end model which directly takes the texts describing previous subevents as input and automatically generates a short text describing a possible future subevent. Our model captures the two-level sequential structure of a subevent sequence, namely, the word sequence for each subevent and the temporal order of subevents. In addition, our model incorporates the topics of the past subevents to make context-aware prediction of future subevents. Extensive experiments on a real-world dataset demonstrate the superiority of our model over several state-of-the-art methods.

#4 Word Embedding Based Correlation Model for Question/Answer Matching [PDF] [Copy] [Kimi]

Authors: Yikang Shen ; Wenge Rong ; Nan Jiang ; Baolin Peng ; Jie Tang ; Zhang Xiong

The large scale of Q&A archives accumulated in community based question answering (CQA) servivces are important information and knowledge resource on the web. Question and answer matching task has been attached much importance to for its ability to reuse knowledge stored in these systems: it can be useful in enhancing user experience with recurrent questions. In this paper, a Word Embedding based Correlation (WEC) model is proposed by integrating advantages of both the translation model and word embedding. Given a random pair of words, WEC can score their co-occurrence probability in Q&A pairs, while it can also leverage the continuity and smoothness of continuous space word representation to deal with new pairs of words that are rare in the training parallel text. An experimental study on Yahoo! Answers dataset and Baidu Zhidao dataset shows this new method's promising potential.

#5 Bootstrapping Distantly Supervised IE Using Joint Learning and Small Well-Structured Corpora [PDF] [Copy] [Kimi]

Authors: Lidong Bing ; Bhuwan Dhingra ; Kathryn Mazaitis ; Jong Hyuk Park ; William W. Cohen

We propose a framework to improve the performance of distantly-supervised relation extraction, by jointly learning to solve two related tasks: concept-instance extraction and relation extraction. We further extend this framework to make a novel use of document structure: in some small, well-structured corpora, sections can be identified that correspond to relation arguments, and distantly-labeled examples from such sections tend to have good precision. Using these as seeds we extract additional relation examples by applying label propagation on a graph composed of noisy examples extracted from a large unstructured testing corpus. Combined with the soft constraint that concept examples should have the same type as the second argument of the relation, we get significant improvements over several state-of-the-art approaches to distantly-supervised relation extraction, and reasonable extraction performance even with very small set of distant labels.

#6 Distant Supervision via Prototype-Based Global Representation Learning [PDF] [Copy] [Kimi]

Authors: Xianpei Han ; Le Sun

Distant supervision (DS) is a promising technique for relation extraction. Currently, most DS approaches build relation extraction models in local instance feature space, often suffer from the multi-instance problem and the missing label problem. In this paper, we propose a new DS method — prototype-based global representation learning, which can effectively resolve the multi-instance problem and the missing label problem by learning informative entity pair representations, and building discriminative extraction models at the entity pair level, rather than at the instance level. Specifically, we propose a prototype-based embedding algorithm, which can embed entity pairs into a prototype-based global feature space; we then propose a neural network model, which can classify entity pairs into target relation types by summarizing relevant information from multiple instances. Experimental results show that our method can achieve significant performance improvement over traditional DS methods.

#7 Improving Event Causality Recognition with Multiple Background Knowledge Sources Using Multi-Column Convolutional Neural Networks [PDF] [Copy] [Kimi]

Authors: Canasai Kruengkrai ; Kentaro Torisawa ; Chikara Hashimoto ; Julien Kloetzer ; Jong-Hoon Oh ; Masahiro Tanaka

We propose a method for recognizing such event causalities as "smoke cigarettes" → "die of lung cancer" using background knowledge taken from web texts as well as original sentences from which candidates for the causalities were extracted. We retrieve texts related to our event causality candidates from four billion web pages by three distinct methods, including a why-question answering system, and feed them to our multi-column convolutional neural networks. This allows us to identify the useful background knowledge scattered in web texts and effectively exploit the identified knowledge to recognize event causalities. We empirically show that the combination of our neural network architecture and background knowledge significantly improves average precision, while the previous state-of-the-art method gains just a small benefit from such background knowledge.

#8 Attentive Interactive Neural Networks for Answer Selection in Community Question Answering [PDF] [Copy] [Kimi]

Authors: Xiaodong Zhang ; Sujian Li ; Lei Sha ; Houfeng Wang

Answer selection plays a key role in community question answering (CQA). Previous research on answer selection usually ignores the problems of redundancy and noise prevalent in CQA. In this paper, we propose to treat different text segments differently and design a novel attentive interactive neural network (AI-NN) to focus on those text segments useful to answer selection. The representations of question and answer are first learned by convolutional neural networks (CNNs) or other neural network architectures. Then AI-NN learns interactions of each paired segments of two texts. Row-wise and column-wise pooling are used afterwards to collect the interactions. We adopt attention mechanism to measure the importance of each segment and combine the interactions to obtain fixed-length representations for question and answer. Experimental results on CQA dataset in SemEval-2016 demonstrate that AI-NN outperforms state-of-the-art method.

#9 Salience Estimation via Variational Auto-Encoders for Multi-Document Summarization [PDF] [Copy] [Kimi]

Authors: Piji Li ; Zihao Wang ; Wai Lam ; Zhaochun Ren ; Lidong Bing

We propose a new unsupervised sentence salience framework for Multi-Document Summarization (MDS), which can be divided into two components: latent semantic modeling and salience estimation. For latent semantic modeling, a neural generative model called Variational Auto-Encoders (VAEs) is employed to describe the observed sentences and the corresponding latent semantic representations. Neural variational inference is used for the posterior inference of the latent variables. For salience estimation, we propose an unsupervised data reconstruction framework, which jointly considers the reconstruction for latent semantic space and observed term vector space. Therefore, we can capture the salience of sentences from these two different and complementary vector spaces. Thereafter, the VAEs-based latent semantic model is integrated into the sentence salience estimation component in a unified fashion, and the whole framework can be trained jointly by back-propagation via multi-task learning. Experimental results on the benchmark datasets DUC and TAC show that our framework achieves better performance than the state-of-the-art models.

#10 Unsupervised Sentiment Analysis with Signed Social Networks [PDF¹] [Copy] [Kimi]

Authors: Kewei Cheng ; Jundong Li ; Jiliang Tang ; Huan Liu

Huge volumes of opinion-rich data is user-generated in social media at an unprecedented rate, easing the analysis of individual and public sentiments. Sentiment analysis has shown to be useful in probing and understanding emotions, expressions and attitudes in the text. However, the distinct characteristics of social media data present challenges to traditional sentiment analysis. First, social media data is often noisy, incomplete and fast-evolved which necessitates the design of a sophisticated learning model. Second, sentiment labels are hard to collect which further exacerbates the problem by not being able to discriminate sentiment polarities. Meanwhile, opportunities are also unequivocally presented. Social media contains rich sources of sentiment signals in textual terms and user interactions, which could be helpful in sentiment analysis. While there are some attempts to leverage implicit sentiment signals in positive user interactions, little attention is paid on signed social networks with both positive and negative links. The availability of signed social networks motivates us to investigate if negative links also contain useful sentiment signals. In this paper, we study a novel problem of unsupervised sentiment analysis with signed social networks. In particular, we incorporate explicit sentiment signals in textual terms and implicit sentiment signals from signed social networks into a coherent model SignedSenti for unsupervised sentiment analysis. Empirical experiments on two real-world datasets corroborate its effectiveness.

#11 Efficient Dependency-Guided Named Entity Recognition [PDF¹] [Copy] [Kimi]

Authors: Zhanming Jie ; Aldrian Muis ; Wei Lu

Named entity recognition (NER), which focuses on the extraction of semantically meaningful named entities and their semantic classes from text, serves as an indispensable component for several down-stream natural language processing (NLP) tasks such as relation extraction and event extraction. Dependency trees, on the other hand, also convey crucial semantic-level information. It has been shown previously that such information can be used to improve the performance of NER. In this work, we investigate on how to better utilize the structured information conveyed by dependency trees to improve the performance of NER. Specifically, unlike existing approaches which only exploit dependency information for designing local features, we show that certain global structured information of the dependency trees can be exploited when building NER models where such information can provide guided learning and inference. Through extensive experiments, we show that our proposed novel dependency-guided NER model performs competitively with models based on conventional semi-Markov conditional random fields, while requiring significantly less running time.

#12 Automatic Emphatic Information Extraction from Aligned Acoustic Data and Its Application on Sentence Compression [PDF] [Copy] [Kimi]

Authors: Yanju Chen ; Rong Pan

We introduce a novel method to extract and utilize the semantic information from acoustic data. By automatic Speech-To-Text alignment techniques, we are able to detect word-based acoustic durations that can prosodically emphasize specific words in an utterance. We model and analyze the sentence-based emphatic patterns by predicting the emphatic levels using only the lexical features, and demonstrate the potential ability of emphatic information produced by such an unsupervised method to improve the performance of NLP tasks, such as sentence compression, by providing weak supervision on multi-task learning based on LSTMs.

#13 Collaborative User Clustering for Short Text Streams [PDF] [Copy] [Kimi]

Authors: Shangsong Liang ; Zhaochun Ren ; Emine Yilmaz ; Evangelos Kanoulas

In this paper, we study the problem of user clustering in the context of their published short text streams. Clustering users by short text streams is more challenging than in the case of long documents associated with them as it is difficult to track users' dynamic interests in streaming sparse data. To obtain better user clustering performance, we propose a user collaborative interest tracking model (UCIT) that aims at tracking changes of each user's dynamic topic distributions in collaboration with their followees', based both on the content of current short texts and the previously estimated distributions. We evaluate our proposed method via a benchmark dataset consisting of Twitter users and their tweets. Experimental results validate the effectiveness of our proposed UCIT model that integrates both users' and their collaborative interests for user clustering by short text streams.

#14 Efficiently Mining High Quality Phrases from Texts [PDF] [Copy] [Kimi]

Authors: Bing Li ; Xiaochun Yang ; Bin Wang ; Wei Cui

Phrase mining is a key research problem for semantic analysis and text-based information retrieval. The existing approaches based on NLP, frequency, and statistics cannot extract high quality phrases and the processing is also time consuming, which are not suitable for dynamic on-line applications. In this paper, we propose an efficient high-quality phrase mining approach (EQPM). To the best of our knowledge, our work is the first effort that considers both intra-cohesion and inter-isolation in mining phrases, which is able to guarantee appropriateness. We also propose a strategy to eliminate order sensitiveness, and ensure the completeness of phrases. We further design efficient algorithms to make the proposed model and strategy feasible. The empirical evaluations on four real data sets demonstrate that our approach achieved a considerable quality improvement and the processing time was 2.3X - 29X faster than the state-of-the-art works.

#15 Greedy Flipping for Constrained Word Deletion [PDF] [Copy] [Kimi]

Authors: Jin-ge Yao ; Xiaojun Wan

In this paper we propose a simple yet efficient method for constrained word deletion to compress sentences, based on top-down greedy local flipping from multiple random initializations. The algorithm naturally integrates various grammatical constraints in the compression process, without using time-consuming integer linear programming solvers. Our formulation suits for any objective function involving arbitrary local score definition. Experimental results show that the proposed method achieves nearly identical performance with explicit ILP formulation while being much more efficient.

#16 Recurrent Neural Networks with Auxiliary Labels for Cross-Domain Opinion Target Extraction [PDF] [Copy] [Kimi]

Authors: Ying Ding ; Jianfei Yu ; Jing Jiang

Opinion target extraction is a fundamental task in opinion mining. In recent years, neural network based supervised learning methods have achieved competitive performance on this task. However, as with any supervised learning method, neural network based methods for this task cannot work well when the training data comes from a different domain than the test data. On the other hand, some rule-based unsupervised methods have shown to be robust when applied to different domains. In this work, we use rule-based unsupervised methods to create auxiliary labels and use neural network models to learn a hidden representation that works well for different domains. When this hidden representation is used for opinion target extraction, we find that it can outperform a number of strong baselines with a large margin.

#17 Using Discourse Signals for Robust Instructor Intervention Prediction [PDF] [Copy] [Kimi]

Authors: Muthu Kumar Chandrasekaran ; Carrie Epp ; Min-Yen Kan ; Diane Litman

We tackle the prediction of instructor intervention in student posts from discussion forums in Massive Open Online Courses (MOOCs). Our key finding is that using automatically obtained discourse relations improves the prediction of when instructors intervene in student discussions, when compared with a state-of-the-art, feature-rich baseline. Our supervised classifier makes use of an automatic discourse parser which outputs Penn Discourse Treebank (PDTB) tags that represent in-post discourse features. We show PDTB relation-based features increase the robustness of the classifier and complement baseline features in recalling more diverse instructor intervention patterns. In comprehensive experiments over 14 MOOC offerings from several disciplines, the PDTB discourse features improve performance on average. The resultant models are less dependent on domain-specific vocabulary, allowing them to better generalize to new courses.

#18 Learning Latent Sentiment Scopes for Entity-Level Sentiment Analysis [PDF¹] [Copy] [Kimi]

Authors: Hao Li ; Wei Lu

In this paper, we focus on the task of extracting named entities together with their associated sentiment information in a joint manner. Our key observation in such an entity-level sentiment analysis (a.k.a. targeted sentiment analysis) task is that there exists a sentiment scope within which each named entity is embedded, which largely decides the sentiment information associated with the entity. However, such sentiment scopes are typically not explicitly annotated in the data, and their lengths can be unbounded. Motivated by this, unlike traditional approaches that cast this problem as a simple sequence labeling task, we propose a novel approach that can explicitly model the latent sentiment scopes. Our experiments on the standard datasets demonstrate that our approach is able to achieve better results compared to existing approaches based on conventional conditional random fields (CRFs) and a more recent work based on neural networks.