| Total: 43
Recently, neural network models for natural language processing tasks have been increasingly focused on for their ability of alleviating the burden of manual feature engineering. However, the previous neural models cannot extract the complicated feature compositions as the traditional methods with discrete features. In this work, we propose a feature-enriched neural model for joint Chinese word segmentation and part-of-speech tagging task. Specifically, to simulate the feature templates of traditional discrete feature based models, we use different filters to model the complex compositional features with convolutional and pooling layer, and then utilize long distance dependency information with recurrent layer. Experimental results on five different datasets show the effectiveness of our proposed model.
Deriving event storylines is an effective summarization method to succinctly organize extensive information, which can significantly alleviate the pain of information overload. The critical challenge is the lack of widely recognized definition of storyline metric. Prior studies have developed various approaches based on different assumptions about users' interests. These works can extract interesting patterns, but their assumptions do not guarantee that the derived patterns will match users' preference. On the other hand, their exclusiveness of single modality source misses cross-modality information. This paper proposes a method, multimodal imitation learning via Generative Adversarial Networks(MIL-GAN), to directly model users' interests as reflected by various data. In particular, the proposed model addresses the critical challenge by imitating users' demonstrated storylines. Our proposed model is designed to learn the reward patterns given user-provided storylines and then applies the learned policy to unseen data. The proposed approach is demonstrated to be capable of acquiring the user's implicit intent and outperforming competing methods by a substantial margin with a user study.
While recent neural machine translation approaches have delivered state-of-the-art performance for resource-rich language pairs, they suffer from the data scarcity problem for resource-scarce language pairs. Although this problem can be alleviated by exploiting a pivot language to bridge the source and target languages, the source-to-pivot and pivot-to-target translation models are usually independently trained. In this work, we introduce a joint training algorithm for pivot-based neural machine translation. We propose three methods to connect the two models and enable them to interact with each other during training. Experiments on Europarl and WMT corpora show that joint training of source-to-pivot and pivot-to-target models leads to significant improvements over independent training across various languages.
The ability to solve probability word problems such as those found in introductory discrete mathematics textbooks, is an important cognitive and intellectual skill. In this paper, we develop a two-step end-to-end fully automated approach for solving such questions that is able to automatically provide answers to exercises about probability formulated in natural language.In the first step, a question formulated in natural language is analysed and transformed into a high-level model specified in a declarative language. In the second step, a solution to the high-level model is computed using a probabilistic programming system. On a dataset of 2160 probability problems, our solver is able to correctly answer 97.5% of the questions given a correct model. On the end-to-end evaluation, we are able to answer 12.5% of the questions (or 31.1% if we exclude examples not supported by design).
Stance classification, which aims at detecting the stance expressed in text towards a specific target, is an emerging problem in sentiment analysis. A major difference between stance classification and traditional aspect-level sentiment classification is that the identification of stance is dependent on target which might not be explicitly mentioned in text. This indicates that apart from text content, the target information is important to stance detection. To this end, we propose a neural network-based model, which incorporates target-specific information into stance classification by following a novel attention mechanism. In specific, the attention mechanism is expected to locate the critical parts of text which are related to target. Our evaluations on both the English and Chinese Stance Detection datasets show that the proposed model achieves the state-of-the-art performance.
Grounding, or localizing, a textual phrase in an image is a challenging problem that is integral to visual language understanding. Previous approaches to this task typically make use of candidate region proposals, where end performance depends on that of the region proposal method and additional computational costs are incurred. In this paper, we treat grounding as a regression problem and propose a method to directly identify the region referred to by a textual phrase, eliminating the need for external candidate region prediction. Our approach uses deep neural networks to combine image and text representations and refines the target region with attention models over both image subregions and words in the textual phrase. Despite the challenging nature of this task and sparsity of available data, in evaluation on the ReferIt dataset, our proposed method achieves a new state-of-the-art in performance of 37.26% accuracy, surpassing the previously reported best by over 5 percentage points. We find that combining image and text attention models and an image attention area-sensitive loss function contribute to substantial improvements.
Distant supervised relation extraction (RE) has been an effective way of finding novel relational facts from text without labeled training data. Typically it can be formalized as a multi-instance multi-label problem.In this paper, we introduce a novel neural approach for distant supervised (RE) with specific focus on attention mechanisms.Unlike the feature-based logistic regression model and compositional neural models such as CNN, our approach includes two major attention-based memory components, which is capable of explicitly capturing the importance of each context word for modeling the representation of the entity pair, as well as the intrinsic dependencies between relations.Such importance degree and dependency relationship are calculated with multiple computational layers, each of which is a neural attention model over an external memory. Experiment on real-world datasets shows that our approach performs significantly and consistently better than various baselines.
The main goal of this paper is to describe a general approach to the problem of understanding linguistic phenomena, as they appear in lexical semantics, through the analysis of large scale resources, while exploiting these results to improve the quality of the resources themselves. The main contributions are: the approach itself, a formal quantitative measure of language diversity; a set of formal quantitative measures of resource incompleteness and a large scale resource, called the Universal Knowledge Core (UKC) built following the methodology proposed. As a concrete example of an application, we provide an algorithm for distinguishing polysemes from homonyms, as stored in the UKC.
Providing a plausible explanation for the relationship between two related entities is an important task in some applications of knowledge graphs, such as in search engines. However, most existing methods require a large number of manually labeled training data, which cannot be applied in large-scale knowledge graphs due to the expensive data annotation. In addition, these methods typically rely on costly handcrafted features. In this paper, we propose an effective pairwise ranking model by leveraging clickthrough data of a Web search engine to address these two problems. We first construct large-scale training data by leveraging the query-title pairs derived from clickthrough data of a Web search engine. Then, we build a pairwise ranking model which employs a convolutional neural network to automatically learn relevant features. The proposed model can be easily trained with backpropagation to perform the ranking task. The experiments show that our method significantly outperforms several strong baselines.
Capturing the semantic interaction of pairs of words across arguments and proper argument representation are both crucial issues in implicit discourse relation recognition. The current state-of-the-art represents arguments as distributional vectors that are computed via bi-directional Long Short-Term Memory networks (BiLSTMs), known to have significant model complexity.In contrast, we demonstrate that word-weighted averaging can encode argument representation which can incorporate word pair information efficiently. By saving an order of magnitude in parameters, our proposed model achieves equivalent performance, but trains seven times faster.
In this work we formulate the problem of image captioning as a multimodal translation task. Analogous to machine translation, we present a sequence-to-sequence recurrent neural networks (RNN) model for image caption generation. Different from most existing work where the whole image is represented by convolutional neural network (CNN) feature, we propose to represent the input image as a sequence of detected objects which feeds as the source sequence of the RNN model. In this way, the sequential representation of an image can be naturally translated to a sequence of words, as the target sequence of the RNN model. To represent the image in a sequential way, we extract the objects features in the image and arrange them in a order using convolutional neural networks. To further leverage the visual information from the encoded objects, a sequential attention layer is introduced to selectively attend to the objects that are related to generate corresponding words in the sentences. Extensive experiments are conducted to validate the proposed approach on popular benchmark dataset, i.e., MS COCO, and the proposed model surpasses the state-of-the-art methods in all metrics following the dataset splits of previous work. The proposed approach is also evaluated by the evaluation server of MS COCO captioning challenge, and achieves very competitive results, e.g., a CIDEr of 1.029 (c5) and 1.064 (c40).
The lack of labeled exemplars is an important factor that makes the task of multimedia event detection (MED) complicated and challenging. Utilizing artificially picked and labeled external sources is an effective way to enhance the performance of MED. However, building these data usually requires professional human annotators, and the procedure is too time-consuming and costly to scale. In this paper, we propose a new robust dictionary learning framework for complex event detection, which is able to handle both labeled and easy-to-get unlabeled web videos by sharing the same dictionary. By employing the lq-norm based loss jointly with the structured sparsity based regularization, our model shows strong robustness against the substantial noisy and outlier videos from open source. We exploit an effective optimization algorithm to solve the proposed highly non-smooth and non-convex problem. Extensive experiment results over standard datasets of TRECVID MEDTest 2013 and TRECVID MEDTest 2014 demonstrate the effectiveness and superiority of the proposed framework on complex event detection.
Most of the existing multi-relational network embedding methods, e.g., TransE, are formulated to preserve pair-wise connectivity structures in the networks. With the observations that significant triangular connectivity structures and parallelogram connectivity structures found in many real multi-relational networks are often ignored and that a hard-constraint commonly adopted by most of the network embedding methods is inaccurate by design, we propose a novel representation learning model for multi-relational networks which can alleviate both fundamental limitations. Scalable learning algorithms are derived using the stochastic gradient descent algorithm and negative sampling. Extensive experiments on real multi-relational network datasets of WordNet and Freebase demonstrate the efficacy of the proposed model when compared with the state-of-the-art embedding methods.
Tree-structured neural networks have proven to be effective in learning semantic representations by exploitingsyntactic information. In spite of their success, most existing models suffer from the underfitting problem: they recursively use the same shared compositional function throughout the whole compositional process and lack expressive power due to inability to capture the richness of compositionality.In this paper, we address this issue by introducing the dynamic compositional neural networks over tree structure (DC-TreeNN), in which the compositional function is dynamically generated by a meta network.The role of meta-network is to capture the metaknowledge across the different compositional rules and formulate them. Experimental results on two typical tasks show the effectiveness of the proposed models.
Representing a sentence with a fixed vector has shown its effectiveness in various NLP tasks. Most of the existing methods are based on neural network, which recursively apply different composition functions to a sequence of word vectors thereby obtaining a sentence vector.A hypothesis behind these approaches is that the meaning of any phrase can be composed of the meanings of its constituents.However, many phrases, such as idioms, are apparently non-compositional.To address this problem, we introduce a parameterized compositional switch, which outputs a scalar to adaptively determine whether the meaning of a phrase should be composed of its two constituents.We evaluate our model on five datasets of sentiment classification and demonstrate its efficacy with qualitative and quantitative experimental analysis .
Aspect-level sentiment classification aims at identifying the sentiment polarity of specific target in its context. Previous approaches have realized the importance of targets in sentiment classification and developed various methods with the goal of precisely modeling thier contexts via generating target-specific representations. However, these studies always ignore the separate modeling of targets. In this paper, we argue that both targets and contexts deserve special treatment and need to be learned their own representations via interactive learning. Then, we propose the interactive attention networks (IAN) to interactively learn attentions in the contexts and targets, and generate the representations for targets and contexts separately. With this design, the IAN model can well represent a target and its collocative context, which is helpful to sentiment classification. Experimental results on SemEval 2014 Datasets demonstrate the effectiveness of our model.
Topic models have been successfully applied in lexicon extraction. However, most previous methods are limited to document-aligned data. In this paper, we try to address two challenges of applying topic models to lexicon extraction in non-parallel data: 1) hard to model the word relationship and 2) noisy seed dictionary. To solve these two challenges, we propose two new bilingual topic models to better capture the semantic information of each word while discriminating the multiple translations in a noisy seed dictionary. We extend the scope of topic models by inverting the roles of "word" and "document". In addition, to solve the problem of noise in seed dictionary, we incorporate the probability of translation selection in our models. Moreover, we also propose an effective measure to evaluate the similarity of words in different languages and select the optimal translation pairs. Experimental results using real world data demonstrate the utility and efficacy of the proposed models.
Recent work on argument persuasiveness has focused on determining how persuasive an argument is. Oftentimes, however, it is equally important to understand why an argument is unpersuasive, as it is difficult for an author to make her argument more persuasive unless she first knows what errors made it unpersuasive. Motivated by this practical concern, we (1) annotate a corpus of debate comments with not only their persuasiveness scores but also the errors they contain, (2) propose an approach to persuasiveness scoring and error identification that outperforms competing baselines, and (3) show that the persuasiveness scores computed by our approach can indeed be explained by the errors it identifies.
In this work, we focus on semantic parsing of natural language conversations. Most existing methods for semantic parsing are based on understanding the semantics of a single sentence at a time. However, understanding conversations also requires an understanding of conversational context and discourse structure across sentences. We formulate semantic parsing of conversations as a structured prediction task, incorporating structural features that model the `flow of discourse' across sequences of utterances. We create a dataset for semantic parsing of conversations, consisting of 113 real-life sequences of interactions of human users with an automated email assistant. The data contains 4759 natural language statements paired with annotated logical forms. Our approach yields significant gains in performance over traditional semantic parsing.
Lexically and syntactically simpler sentences result in shorter reading time and better understanding in many people. However, no reliable systems for automatic assessment of absolute sentence complexity have been proposed so far. Instead, the assessment is usually done manually, requiring expert human annotators. To address this problem, we first define the sentence complexity assessment as a five-level classification task, and build a ‘gold standard’ dataset. Next, we propose robust systems for sentence complexity assessment, using a novel set of features based on leveraging lexical properties of freely available corpora, and investigate the impact of the feature type and corpus size on the classification performance.
Answer sentence selection has been widely adopted recently for benchmarking techniques in Question Answering. Previous proposals for the task are essentially general solutions taking the form of neural networks that measure semantic similarity. In contrast, the present paper describes a simple technique to take advantage of such general-purpose tools for dealing with questions and answer sentences without changing the base system. The technique involves replacing wh-words in input questions with a word denoting the prototype of all answers. These transformed questions are passed as input to an existing neural network built for measuring semantic similarity. This technique is evaluated on two different neural network architectures over two datasets: TrecQA and WikiQA. Results of our experiments show improvement in overall accuracy across most question types we are interested in: `who', `when' and `where'-type questions.
Headline generation is a task of abstractive text summarization, and previously suffers from the immaturity of natural language generation techniques. Recent success of neural sentence summarization models shows the capacity of generating informative, fluent headlines conditioned on selected recapitulative sentences. In this paper, we investigate the extension of sentence summarization models to the document headline generation task. The challenge is that extending the sentence summarization model to consider more document information will mostly confuse the model and hurt the performance. In this paper, we propose a coarse-to-fine approach, which first identifies the important sentences of a document using document summarization techniques, and then exploits a multi-sentence summarization model with hierarchical attention to leverage the important sentences for headline generation. Experimental results on a large real dataset demonstrate the proposed approach significantly improves the performance of neural sentence summarization models on the headline generation task.
A word in natural language can be polysemous, having multiple meanings, as well as synonymous, meaning the same thing as other words. Word sense induction attempts to find the senses of polysemous words. Synonymy detection attempts to find when two words are interchangeable. We combine these tasks, first inducing word senses and then detecting similar senses to form word-sense synonym sets (synsets) in an unsupervised fashion. Given pairs of images and text with noun phrase labels, we perform synset induction to produce collections of underlying concepts described by one or more noun phrases. We find that considering multi-modal features from both visual and textual context yields better induced synsets than using either context alone. Human evaluations show that our unsupervised, multi-modally induced synsets are comparable in quality to annotation-assisted ImageNet synsets, achieving about 84% of ImageNet synsets' approval.
Recently proposed Story Cloze Test [Mostafazadeh et al., 2016] is a commonsense machine comprehension application to deal with natural language understanding problem. This dataset contains a lot of story tests which require commonsense inference ability. Unfortunately, the training data is almost unsupervised where each context document followed with only one positive sentence that can be inferred from the context. However, in the testing period, we must make inference from two candidate sentences. To tackle this problem, we employ the generative adversarial networks (GANs) to generate fake sentence. We proposed a Conditional GANs in which the generator is conditioned by the context. Our experiments show the advantage of the CGANs in discriminating sentence and achieve state-of-the-art results in commonsense story reading comprehension task compared with previous feature engineering and deep learning methods.
User attribute classification aims to identify users’ attributes (e.g., gender, age and profession) by leveraging user generated content. However, conventional approaches to user attribute classification focus on single attribute classification involving only one user attribute, which completely ignores the relationship among various user attributes. In this paper, we confront a novel scenario in user attribute classification where relevant user attributes are jointly learned, attempting to make the relevant attribute classification tasks help each other. Specifically, we propose a joint learning approach, namely Aux-LSTM, which first learns a proper auxiliary representation between the related tasks and then leverages the auxiliary representation to integrate the learning process in both tasks. Empirical studies demonstrate the effectiveness of our proposed approach to joint learning on relevant user attributes.