IJCAI.2019 - Natural Language Processing

| Total: 85

#1 Early Discovery of Emerging Entities in Microblogs [PDF1] [Copy] [Kimi] [REL]

Authors: Satoshi Akasaki, Naoki Yoshinaga, Masashi Toyoda

Keeping up to date on emerging entities that appear every day is indispensable for various applications, such as social-trend analysis and marketing research. Previous studies have attempted to detect unseen entities that are not registered in a particular knowledge base as emerging entities and consequently find non-emerging entities since the absence of entities in knowledge bases does not guarantee their emergence. We therefore introduce a novel task of discovering truly emerging entities when they have just been introduced to the public through microblogs and propose an effective method based on time-sensitive distant supervision, which exploits distinctive early-stage contexts of emerging entities. Experimental results with a large-scale Twitter archive show that the proposed method achieves 83.2% precision of the top 500 discovered emerging entities, which outperforms baselines based on unseen entity recognition with burst detection. Besides notable emerging entities, our method can discover massive long-tail and homographic emerging entities. An evaluation of relative recall shows that the method detects 80.4% emerging entities newly registered in Wikipedia; 92.8% of them are discovered earlier than their registration in Wikipedia, and the average lead-time is more than one year (578 days).


#2 Neural Program Induction for KBQA Without Gold Programs or Query Annotations [PDF] [Copy] [Kimi] [REL]

Authors: Ghulam Ahmed Ansari, Amrita Saha, Vishwajeet Kumar, Mohan Bhambhani, Karthik Sankaranarayanan, Soumen Chakrabarti

Neural Program Induction (NPI) is a paradigm for decomposing high-level tasks such as complex question-answering over knowledge bases (KBQA) into executable programs by employing neural models. Typically, this involves two key phases: i) inferring input program variables from the high-level task description, and ii) generating the correct program sequence involving these variables. Here we focus on NPI for Complex KBQA with only the final answer as supervision, and not gold programs. This raises major challenges; namely, i) noisy query annotation in the absence of any supervision can lead to catastrophic forgetting while learning, ii) reward becomes extremely sparse owing to the noise. To deal with these, we propose a noise-resilient NPI model, Stable Sparse Reward based Programmer (SSRP) that evades noise-induced instability through continual retrospection and its comparison with current learning behavior. On complex KBQA datasets, SSRP performs at par with hand-crafted rule-based models when provided with gold program input, and in the noisy settings outperforms state-of-the-art models by a significant margin even with a noisier query annotator.


#3 Medical Concept Representation Learning from Multi-source Data [PDF] [Copy] [Kimi] [REL]

Authors: Tian Bai, Brian L. Egleston, Richard Bleicher, Slobodan Vucetic

Representing words as low dimensional vectors is very useful in many natural language processing tasks. This idea has been extended to medical domain where medical codes listed in medical claims are represented as vectors to facilitate exploratory analysis and predictive modeling. However, depending on a type of a medical provider, medical claims can use medical codes from different ontologies or from a combination of ontologies, which complicates learning of the representations. To be able to properly utilize such multi-source medical claim data, we propose an approach that represents medical codes from different ontologies in the same vector space. We first modify the Pointwise Mutual Information (PMI) measure of similarity between the codes. We then develop a new negative sampling method for word2vec model that implicitly factorizes the modified PMI matrix. The new approach was evaluated on the code cross-reference problem, which aims at identifying similar codes across different ontologies. In our experiments, we evaluated cross-referencing between ICD-9 and CPT medical code ontologies. Our results indicate that vector representations of codes learned by the proposed approach provide superior cross-referencing when compared to several existing approaches.


#4 Multi-Domain Sentiment Classification Based on Domain-Aware Embedding and Attention [PDF] [Copy] [Kimi] [REL]

Authors: Yitao Cai, Xiaojun Wan

Sentiment classification is a fundamental task in NLP. However, as revealed by many researches, sentiment classification models are highly domain-dependent. It is worth investigating to leverage data from different domains to improve the classification performance in each domain. In this work, we propose a novel completely-shared multi-domain neural sentiment classification model to learn domain-aware word embeddings and make use of domain-aware attention mechanism. Our model first utilizes BiLSTM for domain classification and extracts domain-specific features for words, which are then combined with general word embeddings to form domain-aware word embeddings. Domain-aware word embeddings are fed into another BiLSTM to extract sentence features. The domain-aware attention mechanism is used for selecting significant features, by using the domain-aware sentence representation as the query vector. Evaluation results on public datasets with 16 different domains demonstrate the efficacy of our proposed model. Further experiments show the generalization ability and the transferability of our model.


#5 A Latent Variable Model for Learning Distributional Relation Vectors [PDF] [Copy] [Kimi] [REL]

Authors: Jose Camacho-Collados, Luis Espinosa-Anke, Shoaib Jameel, Steven Schockaert

Recently a number of unsupervised approaches have been proposed for learning vectors that capture the relationship between two words. Inspired by word embedding models, these approaches rely on co-occurrence statistics that are obtained from sentences in which the two target words appear. However, the number of such sentences is often quite small, and most of the words that occur in them are not relevant for characterizing the considered relationship. As a result, standard co-occurrence statistics typically lead to noisy relation vectors. To address this issue, we propose a latent variable model that aims to explicitly determine what words from the given sentences best characterize the relationship between the two target words. Relation vectors then correspond to the parameters of a simple unigram language model which is estimated from these words.


#6 Generating Multiple Diverse Responses with Multi-Mapping and Posterior Mapping Selection [PDF] [Copy] [Kimi] [REL]

Authors: Chaotao Chen, Jinhua Peng, Fan Wang, Jun Xu, Hua Wu

In human conversation an input post is open to multiple potential responses, which is typically regarded as a one-to-many problem. Promising approaches mainly incorporate multiple latent mechanisms to build the one-to-many relationship. However, without accurate selection of the latent mechanism corresponding to the target response during training, these methods suffer from a rough optimization of latent mechanisms. In this paper, we propose a multi-mapping mechanism to better capture the one-to-many relationship, where multiple mapping modules are employed as latent mechanisms to model the semantic mappings from an input post to its diverse responses. For accurate optimization of latent mechanisms, a posterior mapping selection module is designed to select the corresponding mapping module according to the target response for further optimization. We also introduce an auxiliary matching loss to facilitate the optimization of posterior mapping selection. Empirical results demonstrate the superiority of our model in generating multiple diverse and informative responses over the state-of-the-art methods.


#7 Sentiment-Controllable Chinese Poetry Generation [PDF] [Copy] [Kimi] [REL]

Authors: Huimin Chen, Xiaoyuan Yi, Maosong Sun, Wenhao Li, Cheng Yang, Zhipeng Guo

Expressing diverse sentiments is one of the main purposes of human poetry creation. Existing Chinese poetry generation models have made great progress in poetry quality, but they all neglected to endow generated poems with specific sentiments. Such defect leads to strong sentiment collapse or bias and thus hurts the diversity and semantics of generated poems. Meanwhile, there are few sentimental Chinese poetry resources for studying. To address this problem, we first collect a manually-labelled sentimental poetry corpus with fine-grained sentiment labels. Then we propose a novel semi-supervised conditional Variational Auto-Encoder model for sentiment-controllable poetry generation. Besides, since poetry is discourse-level text where the polarity and intensity of sentiment could transfer among lines, we incorporate a temporal module to capture sentiment transition patterns among different lines. Experimental results show our model can control the sentiment of not only a whole poem but also each line, and improve the poetry diversity against the state-of-the-art models without losing quality.


#8 From Words to Sentences: A Progressive Learning Approach for Zero-resource Machine Translation with Visual Pivots [PDF] [Copy] [Kimi] [REL]

Authors: Shizhe Chen, Qin Jin, Jianlong Fu

The neural machine translation model has suffered from the lack of large-scale parallel corpora. In contrast, we humans can learn multi-lingual translations even without parallel texts by referring our languages to the external world. To mimic such human learning behavior, we employ images as pivots to enable zero-resource translation learning. However, a picture tells a thousand words, which makes multi-lingual sentences pivoted by the same image noisy as mutual translations and thus hinders the translation model learning. In this work, we propose a progressive learning approach for image-pivoted zero-resource machine translation. Since words are less diverse when grounded in the image, we first learn word-level translation with image pivots, and then progress to learn the sentence-level translation by utilizing the learned word translation to suppress noises in image-pivoted multi-lingual sentences. Experimental results on two widely used image-pivot translation datasets, IAPR-TC12 and Multi30k, show that the proposed approach significantly outperforms other state-of-the-art methods.


#9 Learning towards Abstractive Timeline Summarization [PDF] [Copy] [Kimi] [REL]

Authors: Xiuying Chen, Zhangming Chan, Shen Gao, Meng-Hsuan Yu, Dongyan Zhao, Rui Yan

Timeline summarization targets at concisely summarizing the evolution trajectory along the timeline and existing timeline summarization approaches are all based on extractive methods.In this paper, we propose the task of abstractive timeline summarization, which tends to concisely paraphrase the information in the time-stamped events.Unlike traditional document summarization, timeline summarization needs to model the time series information of the input events and summarize important events in chronological order.To tackle this challenge, we propose a memory-based timeline summarization model (MTS).Concretely, we propose a time-event memory to establish a timeline, and use the time position of events on this timeline to guide generation process.Besides, in each decoding step, we incorporate event-level information into word-level attention to avoid confusion between events.Extensive experiments are conducted on a large-scale real-world dataset, and the results show that MTS achieves the state-of-the-art performance in terms of both automatic and human evaluations.


#10 Coreference Aware Representation Learning for Neural Named Entity Recognition [PDF] [Copy] [Kimi] [REL]

Authors: Zeyu Dai, Hongliang Fei, Ping Li

Recent neural network models have achieved state-of-the-art performance on the task of named entity recognition (NER). However, previous neural network models typically treat the input sentences as a linear sequence of words but ignore rich structural information, such as the coreference relations among non-adjacent words, phrases or entities. In this paper, we propose a novel approach to learn coreference-aware word representations for the NER task at the document level. In particular, we enrich the well-known neural architecture ``CNN-BiLSTM-CRF'' with a coreference layer on top of the BiLSTM layer to incorporate coreferential relations. Furthermore, we introduce the coreference regularization to ensure the coreferential entities to share similar representations and consistent predictions within the same coreference cluster. Our proposed model achieves new state-of-the-art performance on two NER benchmarks: CoNLL-2003 and OntoNotes v5.0. More importantly, we demonstrate that our framework does not rely on gold coreference knowledge, and can still work well even when the coreferential relations are generated by a third-party toolkit.


#11 Learning Assistance from an Adversarial Critic for Multi-Outputs Prediction [PDF] [Copy] [Kimi] [REL]

Authors: Yue Deng, Yilin Shen, Hongxia Jin

We introduce an adversarial-critic-and-assistant (ACA) learning framework to improve the performance of existing supervised learning with multiple outputs. The core contribution of our ACA is the innovation of two novel modules, i.e. an `adversarial critic' and a `collaborative assistant', that are jointly designed to provide augmenting information for facilitating general learning tasks. Our approach is not intended to be regarded as an emerging competitor for tons of well-established algorithms in the field. In fact, most existing approaches, while implemented with different learning objectives, can all be adopted as building blocks seamlessly integrated in the ACA framework to accomplish various real-world tasks. We show the performance and generalization ability of ACA on diverse learning tasks including multi-label classification, attributes prediction and sequence-to-sequence generation.


#12 End-to-End Multi-Perspective Matching for Entity Resolution [PDF] [Copy] [Kimi] [REL]

Authors: Cheng Fu, Xianpei Han, Le Sun, Bo Chen, Wei Zhang, Suhui Wu, Hao Kong

Entity resolution (ER) aims to identify data records referring to the same real-world entity. Due to the heterogeneity of entity attributes and the diversity of similarity measures, one main challenge of ER is how to select appropriate similarity measures for different attributes. Previous ER methods usually employ heuristic similarity selection algorithms, which are highly specialized to specific ER problems and are hard to be generalized to other situations. Furthermore, previous studies usually perform similarity learning and similarity selection independently, which often result in error propagation and are hard to be optimized globally. To resolve the above problems, this paper proposes an end-to-end multi-perspective entity matching model, which can adaptively select optimal similarity measures for heterogenous attributes by jointly learning and selecting similarity measures in an end-to-end way. Experiments on two real-world datasets show that our method significantly outperforms previous ER methods.


#13 Difficulty Controllable Generation of Reading Comprehension Questions [PDF] [Copy] [Kimi] [REL]

Authors: Yifan Gao, Lidong Bing, Wang Chen, Michael Lyu, Irwin King

We investigate the difficulty levels of questions in reading comprehension datasets such as SQuAD, and propose a new question generation setting, named Difficulty-controllable Question Generation (DQG). Taking as input a sentence in the reading comprehension paragraph and some of its text fragments (i.e., answers) that we want to ask questions about, a DQG method needs to generate questions each of which has a given text fragment as its answer, and meanwhile the generation is under the control of specified difficulty labels---the output questions should satisfy the specified difficulty as much as possible. To solve this task, we propose an end-to-end framework to generate questions of designated difficulty levels by exploring a few important intuitions. For evaluation, we prepared the first dataset of reading comprehension questions with difficulty labels. The results show that the question generated by our framework not only have better quality under the metrics like BLEU, but also comply with the specified difficulty labels.


#14 Modeling Source Syntax and Semantics for Neural AMR Parsing [PDF] [Copy] [Kimi] [REL]

Authors: DongLai Ge, Junhui Li, Muhua Zhu, Shoushan Li

Sequence-to-sequence (seq2seq) approaches formalize Abstract Meaning Representation (AMR) parsing as a translation task from a source sentence to a target AMR graph. However, previous studies generally model a source sentence as a word sequence but ignore the inherent syntactic and semantic information in the sentence. In this paper, we propose two effective approaches to explicitly modeling source syntax and semantics into neural seq2seq AMR parsing. The first approach linearizes source syntactic and semantic structure into a mixed sequence of words, syntactic labels, and semantic labels, while in the second approach we propose a syntactic and semantic structure-aware encoding scheme through a self-attentive model to explicitly capture syntactic and semantic relations between words. Experimental results on an English benchmark dataset show that our two approaches achieve significant improvement of 3.1% and 3.4% F1 scores over a strong seq2seq baseline.


#15 CNN-Based Chinese NER with Lexicon Rethinking [PDF] [Copy] [Kimi] [REL]

Authors: Tao Gui, Ruotian Ma, Qi Zhang, Lujun Zhao, Yu-Gang Jiang, Xuanjing Huang

Character-level Chinese named entity recognition (NER) that applies long short-term memory (LSTM) to incorporate lexicons has achieved great success. However, this method fails to fully exploit GPU parallelism and candidate lexicons can conflict. In this work, we propose a faster alternative to Chinese NER: a convolutional neural network (CNN)-based method that incorporates lexicons using a rethinking mechanism. The proposed method can model all the characters and potential words that match the sentence in parallel. In addition, the rethinking mechanism can address the word conflict by feeding back the high-level features to refine the networks. Experimental results on four datasets show that the proposed method can achieve better performance than both word-level and character-level baseline methods. In addition, the proposed method performs up to 3.21 times faster than state-of-the-art methods, while realizing better performance.


#16 Dual Visual Attention Network for Visual Dialog [PDF] [Copy] [Kimi1] [REL]

Authors: Dan Guo, Hui Wang, Meng Wang

Visual dialog is a challenging task, which involves multi-round semantic transformations between vision and language. This paper aims to address cross-modal semantic correlation for visual dialog. Motivated by that Vg (global vision), Vl (local vision), Q (question) and H (history) have inseparable relevances, the paper proposes a novel Dual Visual Attention Network (DVAN) to realize (Vg, Vl, Q, H)--> A. DVAN is a three-stage query-adaptive attention model. In order to acquire accurate A (answer), it first explores the textual attention, which imposes the question on history to pick out related context H'. Then, based on Q and H', it implements respective visual attentions to discover related global image visual hints Vg' and local object-based visual hints Vl'. Next, a dual crossing visual attention is proposed. Vg' and Vl' are mutually embedded to learn the complementary of visual semantics. Finally, the attended textual and visual features are combined to infer the answer. Experimental results on the VisDial v0.9 and v1.0 datasets validate the effectiveness of the proposed approach.


#17 AmazonQA: A Review-Based Question Answering Task [PDF] [Copy] [Kimi] [REL]

Authors: Mansi Gupta, Nitish Kulkarni, Raghuveer Chanda, Anirudha Rayasam, Zachary C. Lipton

Every day, thousands of customers post questions on Amazon product pages. After some time, if they are fortunate, a knowledgeable customer might answer their question. Observing that many questions can be answered based upon the available product reviews, we propose the task of review-based QA. Given a corpus of reviews and a question, the QA system synthesizes an answer. To this end, we introduce a new dataset and propose a method that combines informational retrieval techniques for selecting relevant reviews (given a question) and "reading comprehension" models for synthesizing an answer (given a question and review). Our dataset consists of 923k questions, 3.6M answers and 14M reviews across 156k products. Building on the well-known Amazon dataset, we additionally collect annotations marking each question as either answerable or unanswerable based on the available reviews. A deployed system could first classify a question as answerable before attempting to generate a provisional answer. Notably, unlike many popular QA datasets, here the questions, passages, and answers are extracted from real human interactions. We evaluate a number of models for answer generation and propose strong baselines, demonstrating the challenging nature of this new task.


#18 Answering Binary Causal Questions Through Large-Scale Text Mining: An Evaluation Using Cause-Effect Pairs from Human Experts [PDF] [Copy] [Kimi] [REL]

Authors: Oktie Hassanzadeh, Debarun Bhattacharjya, Mark Feblowitz, Kavitha Srinivas, Michael Perrone, Shirin Sohrabi, Michael Katz

In this paper, we study the problem of answering questions of type "Could X cause Y?" where X and Y are general phrases without any constraints. Answering such questions will assist with various decision analysis tasks such as verifying and extending presumed causal associations used for decision making. Our goal is to analyze the ability of an AI agent built using state-of-the-art unsupervised methods in answering causal questions derived from collections of cause-effect pairs from human experts. We focus only on unsupervised and weakly supervised methods due to the difficulty of creating a large enough training set with a reasonable quality and coverage. The methods we examine rely on a large corpus of text derived from news articles, and include methods ranging from large-scale application of classic NLP techniques and statistical analysis to the use of neural network based phrase embeddings and state-of-the-art neural language models.


#19 GSN: A Graph-Structured Network for Multi-Party Dialogues [PDF] [Copy] [Kimi] [REL]

Authors: Wenpeng Hu, Zhangming Chan, Bing Liu, Dongyan Zhao, Jinwen Ma, Rui Yan

Existing neural models for dialogue response generation assume that utterances are sequentially organized. However, many real-world dialogues involve multiple interlocutors (i.e., multi-party dialogues), where the assumption does not hold as utterances from different interlocutors can occur ``in parallel.'' This paper generalizes existing sequence-based models to a Graph-Structured neural Network (GSN) for dialogue modeling. The core of GSN is a graph-based encoder that can model the information flow along the graph-structured dialogues (two-party sequential dialogues are a special case). Experimental results show that GSN significantly outperforms existing sequence-based models.


#20 Leap-LSTM: Enhancing Long Short-Term Memory for Text Categorization [PDF] [Copy] [Kimi] [REL]

Authors: Ting Huang, Gehui Shen, Zhi-Hong Deng

Recurrent Neural Networks (RNNs) are widely used in the field of natural language processing (NLP), ranging from text categorization to question answering and machine translation. However, RNNs generally read the whole text from beginning to end or vice versa sometimes, which makes it inefficient to process long texts. When reading a long document for a categorization task, such as topic categorization, large quantities of words are irrelevant and can be skipped. To this end, we propose Leap-LSTM, an LSTM-enhanced model which dynamically leaps between words while reading texts. At each step, we utilize several feature encoders to extract messages from preceding texts, following texts and the current word, and then determine whether to skip the current word. We evaluate Leap-LSTM on several text categorization tasks: sentiment analysis, news categorization, ontology classification and topic classification, with five benchmark data sets. The experimental results show that our model reads faster and predicts better than standard LSTM. Compared to previous models which can also skip words, our model achieves better trade-offs between performance and efficiency.


#21 Relation Extraction Using Supervision from Topic Knowledge of Relation Labels [PDF] [Copy] [Kimi] [REL]

Authors: Haiyun Jiang, Li Cui, Zhe Xu, Deqing Yang, Jindong Chen, Chenguang Li, Jingping Liu, Jiaqing Liang, Chao Wang, Yanghua Xiao, Wei Wang

Explicitly exploring the semantics of a relation is significant for high-accuracy relation extraction, which is, however, not fully studied in previous work. In this paper, we mine the topic knowledge of a relation to explicitly represent the semantics of this relation, and model relation extraction as a matching problem. That is, the matching score between a sentence and a candidate relation is predicted for an entity pair. To this end, we propose a deep matching network to precisely model the semantic similarity between a sentence-relation pair. Besides, the topic knowledge also allows us to derive the importance information of samples as well as two knowledge-guided negative sampling strategies in the training process. We conduct extensive experiments to evaluate the proposed framework and observe improvements in AUC of 11.5% and max F1 of 5.4% over the baselines with state-of-the-art performance.


#22 Representation Learning with Weighted Inner Product for Universal Approximation of General Similarities [PDF] [Copy] [Kimi] [REL]

Authors: Geewook Kim, Akifumi Okuno, Kazuki Fukui, Hidetoshi Shimodaira

We propose weighted inner product similarity (WIPS) for neural network-based graph embedding. In addition to the parameters of neural networks, we optimize the weights of the inner product by allowing positive and negative values. Despite its simplicity, WIPS can approximate arbitrary general similarities including positive definite, conditionally positive definite, and indefinite kernels. WIPS is free from similarity model selection, since it can learn any similarity models such as cosine similarity, negative Poincaré distance and negative Wasserstein distance. Our experiments show that the proposed method can learn high-quality distributed representations of nodes from real datasets, leading to an accurate approximation of similarities as well as high performance in inductive tasks.


#23 Incorporating Structural Information for Better Coreference Resolution [PDF] [Copy] [Kimi] [REL]

Authors: Fang Kong, Fu Jian

Coreference resolution plays an important role in text understanding. In the literature, various neural approaches have been proposed and achieved considerable success. However, structural information, which has been proven useful in coreference resolution, has been largely ignored in previous neural approaches. In this paper, we focus on effectively incorporating structural information to neural coreference resolution from three aspects. Firstly, nodes in the parse trees are employed as a constraint to filter out impossible text spans (i.e., mention candidates) in reducing the computational complexity. Secondly, contextual information is encoded in the traversal node sequence instead of the word sequence to better capture hierarchical information for text span representation. Lastly, additional structural features (e.g., the path, siblings, degrees, category of the current node) are encoded to enhance the mention representation. Experimentation on the data-set of the CoNLL 2012 Shared Task shows the effectiveness of our proposed approach in incorporating structural information into neural coreference resolution.


#24 Knowledge Base Question Answering with Topic Units [PDF] [Copy] [Kimi] [REL]

Authors: Yunshi Lan, Shuohang Wang, Jing Jiang

Knowledge base question answering (KBQA) is an important task in natural language processing. Existing methods for KBQA usually start with entity linking, which considers mostly named entities found in a question as the starting points in the KB to search for answers to the question. However, relying only on entity linking to look for answer candidates may not be sufficient. In this paper, we propose to perform topic unit linking where topic units cover a wider range of units of a KB. We use a generation-and-scoring approach to gradually refine the set of topic units. Furthermore, we use reinforcement learning to jointly learn the parameters for topic unit linking and answer candidate ranking in an end-to-end manner. Experiments on three commonly used benchmark datasets show that our method consistently works well and outperforms the previous state of the art on two datasets.


#25 Adversarial Transfer for Named Entity Boundary Detection with Pointer Networks [PDF1] [Copy] [Kimi] [REL]

Authors: Jing Li, Deheng Ye, Shuo Shang

In this paper, we focus on named entity boundary detection, which aims to detect the start and end boundaries of an entity mention in text, without predicting its type. A more accurate and robust detection approach is desired to alleviate error propagation in downstream applications, such as entity linking and fine-grained typing systems. Here, we first develop a novel entity boundary labeling approach with pointer networks, where the output dictionary size depends on the input, which is variable. Furthermore, we propose AT-Bdry, which incorporates adversarial transfer learning into an end-to-end sequence labeling model to encourage domain-invariant representations. More importantly, AT-Bdry can reduce domain difference in data distributions between the source and target domains, via an unsupervised transfer learning approach (i.e., no annotated target-domain data is necessary). We conduct Formal Text to Formal Text, Formal Text to Informal Text and ablation evaluations on five benchmark datasets. Experimental results show that AT-Bdry achieves state-of-the-art transferring performance against recent baselines.