AAAI.2016 - NLP and Machine Learning

| Total: 28

#1 Extracting Biomolecular Interactions Using Semantic Parsing of Biomedical Text [PDF1] [Copy] [Kimi1] [REL]

Authors: Sahil Garg, Aram Galstyan, Ulf Hermjakob, Daniel Marcu

We advance the state of the art in biomolecular interaction extraction with three contributions: (i) We show that deep, Abstract Meaning Representations (AMR) significantly improve the accuracy of a biomolecular interaction extraction system when compared to a baseline that relies solely on surface- and syntax-based features; (ii) In contrast with previous approaches that infer relations on a sentence-by-sentence basis, we expand our framework to enable consistent predictions over sets of sentences (documents); (iii) We further modify and expand a graph kernel learning framework to enable concurrent exploitation of automatically induced AMR (semantic) and dependency structure (syntactic) representations. Our experiments show that our approach yields interaction extraction systems that are more robust in environments where there is a significant mismatch between training and test conditions.


#2 Inside Out: Two Jointly Predictive Models for Word Representations and Phrase Representations [PDF] [Copy] [Kimi] [REL]

Authors: Fei Sun, Jiafeng Guo, Yanyan Lan, Jun Xu, Xueqi Cheng

Distributional hypothesis lies in the root of most existing word representation models by inferring word meaning from its external contexts. However, distributional models cannot handle rare and morphologically complex words very well and fail to identify some fine-grained linguistic regularity as they are ignoring the word forms. On the contrary, morphology points out that words are built from some basic units, i.e., morphemes. Therefore, the meaning and function of such rare words can be inferred from the words sharing the same morphemes, and many syntactic relations can be directly identified based on the word forms. However, the limitation of morphology is that it cannot infer the relationship between two words that do not share any morphemes. Considering the advantages and limitations of both approaches, we propose two novel models to build better word representations by modeling both external contexts and internal morphemes in a jointly predictive way, called BEING and SEING. These two models can also be extended to learn phrase representations according to the distributed morphology theory. We evaluate the proposed models on similarity tasks and analogy tasks. The results demonstrate that the proposed models can outperform state-of-the-art models significantly on both word and phrase representation learning.


#3 Implicit Discourse Relation Classification via Multi-Task Neural Networks [PDF] [Copy] [Kimi] [REL]

Authors: Yang Liu, Sujian Li, Xiaodong Zhang, Zhifang Sui

Without discourse connectives, classifying implicit discourse relations is a challenging task and a bottleneck for building a practical discourse parser. Previous research usually makes use of one kind of discourse framework such as PDTB or RST to improve the classification performance on discourse relations. Actually, under different discourse annotation frameworks, there exist multiple corpora which have internal connections. To exploit the combination of different discourse corpora, we design related discourse classification tasks specific to a corpus, and propose a novel Convolutional Neural Network embedded multi-task learning system to synthesize these tasks by learning both unique and shared representations for each task. The experimental results on the PDTB implicit discourse relation classification task demonstrate that our model achieves significant gains over baseline systems.


#4 Joint Word Representation Learning Using a Corpus and a Semantic Lexicon [PDF] [Copy] [Kimi] [REL]

Authors: Danushka Bollegala, Mohammed Alsuhaibani, Takanori Maehara, Ken-ichi Kawarabayashi

Methods for learning word representations using large text corpora have received much attention lately due to their impressive performancein numerous natural language processing (NLP) tasks such as, semantic similarity measurement, and word analogy detection.Despite their success, these data-driven word representation learning methods do not considerthe rich semantic relational structure between words in a co-occurring context. On the other hand, already much manual effort has gone into the construction of semantic lexicons such as the WordNetthat represent the meanings of words by defining the various relationships that exist among the words in a language.We consider the question, can we improve the word representations learnt using a corpora by integrating theknowledge from semantic lexicons?. For this purpose, we propose a joint word representation learning method that simultaneously predictsthe co-occurrences of two words in a sentence subject to the relational constrains given by the semantic lexicon.We use relations that exist between words in the lexicon to regularize the word representations learnt from the corpus.Our proposed method statistically significantly outperforms previously proposed methods for incorporating semantic lexicons into wordrepresentations on several benchmark datasets for semantic similarity and word analogy.


#5 Text Matching as Image Recognition [PDF] [Copy] [Kimi] [REL]

Authors: Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, Xueqi Cheng

Matching two texts is a fundamental problem in many natural language processing tasks. An effective way is to extract meaningful matching patterns from words, phrases, and sentences to produce the matching score. Inspired by the success of convolutional neural network in image recognition, where neurons can capture many complicated patterns based on the extracted elementary visual patterns such as oriented edges and corners, we propose to model text matching as the problem of image recognition. Firstly, a matching matrix whose entries represent the similarities between words is constructed and viewed as an image. Then a convolutional neural network is utilized to capture rich matching patterns in a layer-by-layer way. We show that by resembling the compositional hierarchies of patterns in image recognition, our model can successfully identify salient signals such as n-gram and n-term matchings. Experimental results demonstrate its superiority against the baselines.


#6 A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations [PDF] [Copy] [Kimi] [REL]

Authors: Shengxian Wan, Yanyan Lan, Jiafeng Guo, Jun Xu, Liang Pang, Xueqi Cheng

Matching natural language sentences is central for many applications such as information retrieval and question answering. Existing deep models rely on a single sentence representation or multiple granularity representations for matching. However, such methods cannot well capture the contextualized local information in the matching process. To tackle this problem, we present a new deep architecture to match two sentences with multiple positional sentence representations. Specifically, each positional sentence representation is a sentence representation at this position, generated by a bidirectional long short term memory (Bi-LSTM). The matching score is finally produced by aggregating interactions between these different positional sentence representations, through k-Max pooling and a multi-layer perceptron. Our model has several advantages: (1) By using Bi-LSTM, rich context of the whole sentence is leveraged to capture the contextualized local information in each positional sentence representation; (2) By matching with multiple positional sentence representations, it is flexible to aggregate different important contextualized local information in a sentence to support the matching; (3) Experiments on different tasks such as question answering and sentence completion demonstrate the superiority of our model.


#7 Syntactic Skeleton-Based Translation [PDF] [Copy] [Kimi] [REL]

Authors: Tong Xiao, Jingbo Zhu, Chunliang Zhang, Tongran Liu

In this paper we propose an approach to modeling syntactically-motivated skeletal structure of source sentence for machine translation. This model allows for application of high-level syntactic transfer rules and low-level non-syntactic rules. It thus involves fully syntactic, non-syntactic, and partially syntactic derivations via a single grammar and decoding paradigm. On large-scale Chinese-English and English-Chinese translation tasks, we obtain an average improvement of +0.9 BLEU across the newswire and web genres.


#8 What Happens Next? Event Prediction Using a Compositional Neural Network Model [PDF] [Copy] [Kimi] [REL]

Authors: Mark Granroth-Wilding, Stephen Clark

We address the problem of automatically acquiring knowledge of event sequences from text, with the aim of providing a predictive model for use in narrative generation systems. We present a neural network model that simultaneously learns embeddings for words describing events, a function to compose the embeddings into a representation of the event, and a coherence function to predict the strength of association between two events. We introduce a new development of the narrative cloze evaluation task, better suited to a setting where rich information about events is available. We compare models that learn vector-space representations of the events denoted by verbs in chains centering on a single protagonist. We find that recent work on learning vector-space embeddings to capture word meaning can be effectively applied to this task, including simple incorporation of a verb's arguments in the representation by vector addition. These representations provide a good initialization for learning the richer, compositional model of events with a neural network, vastly outperforming a number of baselines and competitive alternatives.


#9 Semi-Supervised Multinomial Naive Bayes for Text Classification by Leveraging Word-Level Statistical Constraint [PDF] [Copy] [Kimi] [REL]

Authors: Li Zhao, Minlie Huang, Ziyu Yao, Rongwei Su, Yingying Jiang, Xiaoyan Zhu

Multinomial Naive Bayes with Expectation Maximization (MNB-EM) is a standard semi-supervised learning method to augment Multinomial Naive Bayes (MNB) for text classification. Despite its success, MNB-EM is not stable, and may succeed or fail to improve MNB. We believe that this is because MNB-EM lacks the ability to preserve the class distribution on words. In this paper, we propose a novel method to augment MNB-EM by leveraging the word-level statistical constraint to preserve the class distribution on words. The word-level statistical constraints are further converted to constraints on document posteriors generated by MNB-EM. Experiments demonstrate that our method can consistently improve MNB-EM, and outperforms state-of-art baselines remarkably.


#10 Jointly Modeling Topics and Intents with Global Order Structure [PDF] [Copy] [Kimi] [REL]

Authors: Bei Chen, Jun Zhu, Nan Yang, Tian Tian, Ming Zhou, Bo Zhang

Modeling document structure is of great importance for discourse analysis and related applications. The goal of this research is to capture the document intent structure by modeling documents as a mixture of topic words and rhetorical words. While the topics are relatively unchanged through one document, the rhetorical functions of sentences usually change following certain orders in discourse. We propose GMM-LDA, a topic modeling based Bayesian unsupervised model, to analyze the document intent structure cooperated with order information. Our model is flexible that has the ability to combine the annotations and do supervised learning. Additionally, entropic regularization can be introduced to model the significant divergence between topics and intents. We perform experiments in both unsupervised and supervised settings, results show the superiority of our model over several state-of-the-art baselines.


#11 Learning Statistical Scripts with LSTM Recurrent Neural Networks [PDF] [Copy] [Kimi] [REL]

Authors: Karl Pichotta, Raymond Mooney

Scripts encode knowledge of prototypical sequences of events. We describe a Recurrent Neural Network model for statistical script learning using Long Short-Term Memory, an architecture which has been demonstrated to work well on a range of Artificial Intelligence tasks. We evaluate our system on two tasks, inferring held-out events from text and inferring novel events from text, substantially outperforming prior approaches on both tasks.


#12 Convolution Kernels for Discriminative Learning from Streaming Text [PDF] [Copy] [Kimi] [REL]

Authors: Michal Lukasik, Trevor Cohn

Time series modeling is an important problem with many applications in different domains. Here we consider discriminative learning from time series, where we seek to predict an output response variable based on time series input. We develop a method based on convolution kernels to model discriminative learning over streams of text. Our method outperforms competitive baselines in three synthetic and two real datasets, rumour frequency modeling and popularity prediction tasks.


#13 Inferring Interpersonal Relations in Narrative Summaries [PDF] [Copy] [Kimi] [REL]

Authors: Shashank Srivastava, Snigdha Chaturvedi, Tom Mitchell

Characterizing relationships between people is fundamental for the understanding of narratives. In this work, we address the problem of inferring the polarity of relationships between people in narrative summaries. We formulate the problem as a joint structured prediction for each narrative, and present a general model that combines evidence from linguistic and semantic features, as well as features based on the structure of the social community in the text. We additionally provide a clustering-based approach that can exploit regularities in narrative types. e.g., learn an affinity for love-triangles in romantic stories. On a dataset of movie summaries from Wikipedia, our structured models provide more than 30% error-reduction over a competitive baseline that considers pairs of characters in isolation.


#14 Siamese Recurrent Architectures for Learning Sentence Similarity [PDF] [Copy] [Kimi] [REL]

Authors: Jonas Mueller, Aditya Thyagarajan

We present a siamese adaptation of the Long Short-Term Memory (LSTM) network for labeled data comprised of pairs of variable-length sequences. Our model is applied to assess semantic similarity between sentences, where we exceed state of the art, outperforming carefully handcrafted features and recently proposed neural network systems of greater complexity. For these applications, we provide word-embedding vectors supplemented with synonymic information to the LSTMs, which use a fixed size vector to encode the underlying meaning expressed in a sentence (irrespective of the particular wording/syntax). By restricting subsequent operations to rely on a simple Manhattan metric, we compel the sentence representations learned by our model to form a highly structured space whose geometry reflects complex semantic relationships. Our results are the latest in a line of findings that showcase LSTMs as powerful language models capable of tasks requiring intricate understanding.


#15 Building Earth Mover's Distance on Bilingual Word Embeddings for Machine Translation [PDF] [Copy] [Kimi] [REL]

Authors: Meng Zhang, Yang Liu, Huanbo Luan, Maosong Sun, Tatsuya Izuha, Jie Hao

Following their monolingual counterparts, bilingual word embeddings are also on the rise. As a major application task, word translation has been relying on the nearest neighbor to connect embeddings cross-lingually. However, the nearest neighbor strategy suffers from its inherently local nature and fails to cope with variations in realistic bilingual word embeddings. Furthermore, it lacks a mechanism to deal with many-to-many mappings that often show up across languages. We introduce Earth Mover's Distance to this task by providing a natural formulation that translates words in a holistic fashion, addressing the limitations of the nearest neighbor. We further extend the formulation to a new task of identifying parallel sentences, which is useful for statistical machine translation systems, thereby expanding the application realm of bilingual word embeddings. We show encouraging performance on both tasks.


#16 A Representation Learning Framework for Multi-Source Transfer Parsing [PDF] [Copy] [Kimi] [REL]

Authors: Jiang Guo, Wanxiang Che, David Yarowsky, Haifeng Wang, Ting Liu

Cross-lingual model transfer has been a promising approach for inducing dependency parsers for low-resource languages where annotated treebanks are not available. The major obstacles for the model transfer approach are two-fold: 1. Lexical features are not directly transferable across languages; 2. Target language-specific syntactic structures are difficult to be recovered. To address these two challenges, we present a novel representation learning framework for multi-source transfer parsing. Our framework allows multi-source transfer parsing using full lexical features straightforwardly. By evaluating on the Google universal dependency treebanks (v2.0), our best models yield an absolute improvement of 6.53% in averaged labeled attachment score, as compared with delexicalized multi-source transfer models. We also significantly outperform the state-of-the-art transfer system proposed most recently.


#17 Evaluation of Semantic Dependency Labeling Across Domains [PDF] [Copy] [Kimi] [REL]

Authors: Svetlana Stoyanchev, Amanda Stent, Srinivas Bangalore

One of the key concerns in computational semantics is to construct a domain independent semantic representation which captures the richness of natural language, yet can be quickly customized to a specific domain for practical applications. We propose to use generic semantic frames defined in FrameNet, a domain-independent semantic resource, as an intermediate semantic representation for language understanding in dialog systems. In this paper we: (a) outline a novel method for FrameNet-style semantic dependency labeling that builds on a syntactic dependency parse; and (b) compare the accuracy of domain-adapted and generic approaches to semantic parsing for dialog tasks, using a frame-annotated corpus of human-computer dialogs in an airline reservation domain.


#18 Addressing a Question Answering Challenge by Combining Statistical Methods with Inductive Rule Learning and Reasoning [PDF] [Copy] [Kimi] [REL]

Authors: Arindam Mitra, Chitta Baral

A group of researchers from Facebook has recently proposed a set of 20 question-answering tasks (Facebook's bAbl dataset) as a challenge for the natural language understanding ability of an intelligent agent. These tasks are designed to measure various skills of an agent, such as: fact based question-answering, simple induction, the ability to find paths, co-reference resolution and many more. Their goal is to aid in the development of systems that can learn to solve such tasks and to allow a proper evaluation of such systems. They show existing systems cannot fully solve many of those toy tasks. In this work, we present a system that excels at all the tasks except one. The proposed model of the agent uses the Answer Set Programming (ASP) language as the primary knowledge representation and reasoning language along with the standard statistical Natural Language Processing (NLP) models. Given a training dataset containing a set of narrations, questions and their answers, the agent jointly uses a translation system, an Inductive Logic Programming algorithm and Statistical NLP methods to learn the knowledge needed to answer similar questions. Our results demonstrate that the introduction of a reasoning module significantly improves the performance of an intelligent agent.


#19 A Morphology-Aware Network for Morphological Disambiguation [PDF] [Copy] [Kimi] [REL]

Authors: Eray Yildiz, Caglar Tirkaz, H. Sahin, Mustafa Eren, Omer Sonmez

Agglutinative languages such as Turkish, Finnish andHungarian require morphological disambiguation beforefurther processing due to the complex morphologyof words. A morphological disambiguator is usedto select the correct morphological analysis of a word.Morphological disambiguation is important because itgenerally is one of the first steps of natural languageprocessing and its performance affects subsequent analyses.In this paper, we propose a system that uses deeplearning techniques for morphological disambiguation.Many of the state-of-the-art results in computer vision,speech recognition and natural language processinghave been obtained through deep learning models.However, applying deep learning techniques to morphologicallyrich languages is not well studied. In this work,while we focus on Turkish morphological disambiguationwe also present results for French and German inorder to show that the proposed architecture achieveshigh accuracy with no language-specific feature engineeringor additional resource. In the experiments, weachieve 84.12 , 88.35 and 93.78 morphological disambiguationaccuracy among the ambiguous words forTurkish, German and French respectively.


#20 Non-Linear Similarity Learning for Compositionality [PDF] [Copy] [Kimi] [REL]

Authors: Masashi Tsubaki, Kevin Duh, Masashi Shimbo, Yuji Matsumoto

Many NLP applications rely on the existence ofsimilarity measures over text data.Although word vector space modelsprovide good similarity measures between words,phrasal and sentential similarities derived from compositionof individual words remain as a difficult problem.In this paper, we propose a new method of ofnon-linear similarity learning for semantic compositionality.In this method, word representations are learnedthrough the similarity learning of sentencesin a high-dimensional space with kernel functions.On the task of predicting the semantic similarity oftwo sentences (SemEval 2014, Task 1),our method outperforms linear baselines,feature engineering approaches,recursive neural networks,and achieve competitive results with long short-term memory models.


#21 Instructable Intelligent Personal Agent [PDF] [Copy] [Kimi] [REL]

Authors: Amos Azaria, Jayant Krishnamurthy, Tom Mitchell

Unlike traditional machine learning methods, humans often learn from natural language instruction. As users become increasingly accustomed to interacting with mobile devices using speech, their interest in instructing these devices in natural language is likely to grow. We introduce our Learning by Instruction Agent (LIA), an intelligent personal agent that users can teach to perform new action sequences to achieve new commands, using solely natural language interaction. LIA uses a CCG semantic parser to ground the semantics of each command in terms of primitive executable procedures defining sensors and effectors of the agent. Given a natural language command that LIA does not understand, it prompts the user to explain how to achieve the command through a sequence of steps, also specified in natural language. A novel lexicon induction algorithm enables LIA to generalize across taught commands, e.g., having been taught how to "forward an email to Alice," LIA can correctly interpret the command "forward this email to Bob." A user study involving email tasks demonstrates that users voluntarily teach LIA new commands, and that these taught commands significantly reduce task completion time. These results demonstrate the potential of natural language instruction as a significant, under-explored paradigm for machine learning.


#22 Modeling Evolving Relationships Between Characters in Literary Novels [PDF] [Copy] [Kimi] [REL]

Authors: Snigdha Chaturvedi, Shashank Srivastava, Hal Daume III, Chris Dyer

Studying characters plays a vital role in computationally representing and interpreting narratives. Unlike previous work, which has focused on inferring character roles, we focus on the problem of modeling their relationships. Rather than assuming a fixed relationship for a character pair, we hypothesize that relationships temporally evolve with the progress of the narrative, and formulate the problem of relationship modeling as a structured prediction problem. We propose a semi-supervised framework to learn relationship sequences from fully as well as partially labeled data. We present a Markovian model capable of accumulating historical beliefs about the relationship and status changes. We use a set of rich linguistic and semantically motivated features that incorporate world knowledge to investigate the textual content of narrative. We empirically demonstrate that such a framework outperforms competitive baselines.


#23 Ask, and Shall You Receive? Understanding Desire Fulfillment in Natural Language Text [PDF] [Copy] [Kimi] [REL]

Authors: Snigdha Chaturvedi, Dan Goldwasser, Hal Daume III

The ability to comprehend wishes or desires and their fulfillment is important to Natural Language Understanding. This paper introduces the task of identifying if a desire expressed by a subject in a given short piece of text was fulfilled. We propose various unstructured and structured models that capture fulfillment cues such as the subject's emotional state and actions. Our experiments with two different datasets demonstrate the importance of understanding the narrative and discourse structure to address this task.


#24 Minimally-Constrained Multilingual Embeddings via Artificial Code-Switching [PDF] [Copy] [Kimi] [REL]

Authors: Michael Wick, Pallika Kanani, Adam Pocock

We present a method that consumes a large corpus of multilingual text and produces a single, unified word embedding in which the word vectors generalize across languages. In contrast to current approaches that require language identification, our method is agnostic about the languages with which the documents in the corpus are expressed, and does not rely on parallel corpora to constrain the spaces. Instead we utilize a small set of human provided word translations---which are often freely and readily available. We can encode such word translations as hard constraints in the model's objective functions; however, we find that we can more naturally constrain the space by allowing words in one language to borrow distributional statistics from context words in another language. We achieve this via a process we term artificial code-switching. As the name suggests, we induce code-switching so that words across multiple languages appear in contexts together. Not only do embedding models trained on code-switched data learn common cross-lingual structure, the common structure allows an NLP model trained in a source language to generalize to multiple target languages (achieving up to 80% of the accuracy of models trained with target-language data).


#25 Numerical Relation Extraction with Minimal Supervision [PDF] [Copy] [Kimi] [REL]

Authors: Aman Madaan, Ashish Mittal, . Mausam, Ganesh Ramakrishnan, Sunita Sarawagi

We study a novel task of numerical relation extraction with the goal of extracting relations where one of the arguments is a number or a quantity ( e.g., atomic_number(Aluminium, 13), inflation_rate(India, 10.9%)). This task presents peculiar challenges not found in standard IE, such as the difficulty of matching numbers in distant supervision and the importance of units. We design two extraction systems that require minimal human supervision per relation: (1) NumberRule, a rule based extractor, and (2) NumberTron, a probabilistic graphical model. We find that both systems dramatically outperform MultiR, a state-of-the-art non-numerical IE model, obtaining up to 25 points F-score improvement.