IJCAI.2022 - Natural Language Processing

Total: 73

#1 Aspect-based Sentiment Analysis with Opinion Tree Generation [PDF] [Copy] [Kimi]

Authors: Xiaoyi Bao ; Wang Zhongqing ; Xiaotong Jiang ; Rong Xiao ; Shoushan Li

Existing studies usually extract these sentiment elements by decomposing the complex structure prediction task into multiple subtasks. Despite their effectiveness, these methods ignore the semantic structure in ABSA problems and require extensive task-specific designs. In this study, we introduce an Opinion Tree Generation task, which aims to jointly detect all sentiment elements in a tree. We believe that the opinion tree can reveal a more comprehensive and complete aspect-level sentiment structure. Furthermore, we propose a pre-trained model to integrate both syntax and semantic features for opinion tree generation. On one hand, a pre-trained model with large-scale unlabeled data is important for the tree generation model. On the other hand, the syntax and semantic features are very effective for forming the opinion tree structure. Extensive experiments show the superiority of our proposed method. The results also validate the tree structure is effective to generate sentimental elements.

#2 Speaker-Guided Encoder-Decoder Framework for Emotion Recognition in Conversation [PDF] [Copy] [Kimi]

Authors: Yinan Bao ; Qianwen Ma ; Lingwei Wei ; Wei Zhou ; Songlin Hu

The emotion recognition in conversation (ERC) task aims to predict the emotion label of an utterance in a conversation. Since the dependencies between speakers are complex and dynamic, which consist of intra- and inter-speaker dependencies, the modeling of speaker-specific information is a vital role in ERC. Although existing researchers have proposed various methods of speaker interaction modeling, they cannot explore dynamic intra- and inter-speaker dependencies jointly, leading to the insufficient comprehension of context and further hindering emotion prediction. To this end, we design a novel speaker modeling scheme that explores intra- and inter-speaker dependencies jointly in a dynamic manner. Besides, we propose a Speaker-Guided Encoder-Decoder (SGED) framework for ERC, which fully exploits speaker information for the decoding of emotion. We use different existing methods as the conversational context encoder of our framework, showing the high scalability and flexibility of the proposed framework. Experimental results demonstrate the superiority and effectiveness of SGED.

#3 Learning Meta Word Embeddings by Unsupervised Weighted Concatenation of Source Embeddings [PDF] [Copy] [Kimi]

Author: Danushka Bollegala

Given multiple source word embeddings learnt using diverse algorithms and lexical resources, meta word embedding learning methods attempt to learn more accurate and wide-coverage word embeddings. Prior work on meta-embedding has repeatedly discovered that simple vector concatenation of the source embeddings to be a competitive baseline. However, it remains unclear as to why and when simple vector concatenation can produce accurate meta-embeddings. We show that weighted concatenation can be seen as a spectrum matching operation between each source embedding and the meta-embedding, minimising the pairwise inner-product loss. Following this theoretical analysis, we propose two \emph{unsupervised} methods to learn the optimal concatenation weights for creating meta-embeddings from a given set of source embeddings. Experimental results on multiple benchmark datasets show that the proposed weighted concatenated meta-embedding methods outperform previously proposed meta-embedding learning methods.

#4 PCVAE: Generating Prior Context for Dialogue Response Generation [PDF] [Copy] [Kimi]

Authors: Zefeng Cai ; Zerui Cai

Conditional Variational AutoEncoder (CVAE) is promising for modeling one-to-many relationships in dialogue generation, as it can naturally generate many responses from a given context. However, the conventional used continual latent variables in CVAE are more likely to generate generic rather than distinct and specific responses. To resolve this problem, we introduce a novel discrete variable called prior context which enables the generation of favorable responses. Specifically, we present Prior Context VAE (PCVAE), a hierarchical VAE that learns prior context from data automatically for dialogue generation. Meanwhile, we design Active Codeword Transport (ACT) to help the model actively discover potential prior context. Moreover, we propose Autoregressive Compatible Arrangement (ACA) that enables modeling prior context in autoregressive style, which is crucial for selecting appropriate prior context according to a given context. Extensive experiments demonstrate that PCVAE can generate distinct responses and significantly outperforms strong baselines.

#5 Towards Joint Intent Detection and Slot Filling via Higher-order Attention [PDF] [Copy] [Kimi]

Authors: Dongsheng Chen ; Zhiqi Huang ; Xian Wu ; Shen Ge ; Yuexian Zou

Recently, attention-based models for joint intent detection and slot filling have achieved state-of-the-art performance. However, we think the conventional attention can only capture the first-order feature interaction between two tasks and is insufficient. To address this issue, we propose a unified BiLinear attention block, which leverages bilinear pooling to synchronously explore both the contextual and channel-wise bilinear attention distributions to capture the second-order interactions between the input intent and slot features. Higher-order interactions are constructed by combining many such blocks and exploiting Exponential Linear activations. Furthermore, we present a Higher-order Attention Network (HAN) to jointly model them. The experimental results show that our approach outperforms the state-of-the-art results. We also conduct experiments on the new SLURP dataset, and give a discussion on HAN’s properties, i.e., robustness and generalization.

#6 Effective Graph Context Representation for Document-level Machine Translation [PDF] [Copy] [Kimi]

Authors: Kehai Chen ; Muyun Yang ; Masao Utiyama ; Eiichiro Sumita ; Rui Wang ; Min Zhang

Document-level neural machine translation (DocNMT) universally encodes several local sentences or the entire document. Thus, DocNMT does not consider the relevance of document-level contextual information, for example, some context (i.e., content words, logical order, and co-occurrence relation) is more effective than another auxiliary context (i.e., functional and auxiliary words). To address this issue, we first utilize the word frequency information to recognize content words in the input document, and then use heuristical relations to summarize content words and sentences as a graph structure without relying on external syntactic knowledge. Furthermore, we apply graph attention networks to this graph structure to learn its feature representation, which allows DocNMT to more effectively capture the document-level context. Experimental results on several widely-used document-level benchmarks demonstrated the effectiveness of the proposed approach.

#7 DictBERT: Dictionary Description Knowledge Enhanced Language Model Pre-training via Contrastive Learning [PDF1] [Copy] [Kimi1]

Authors: Qianglong Chen ; Feng-Lin Li ; Guohai Xu ; Ming Yan ; Ji Zhang ; Yin Zhang

Although pre-trained language models (PLMs) have achieved state-of-the-art performance on various natural language processing (NLP) tasks, they are shown to be lacking in knowledge when dealing with knowledge driven tasks. Despite the many efforts made for injecting knowledge into PLMs, this problem remains open. To address the challenge, we propose DictBERT, a novel approach that enhances PLMs with dictionary knowledge which is easier to acquire than knowledge graph (KG). During pre-training, we present two novel pre-training tasks to inject dictionary knowledge into PLMs via contrastive learning: dictionary entry prediction and entry description discrimination. In fine-tuning, we use the pre-trained DictBERT as a plugin knowledge base (KB) to retrieve implicit knowledge for identified entries in an input sequence, and infuse the retrieved knowledge into the input to enhance its representation via a novel extra-hop attention mechanism. We evaluate our approach on a variety of knowledge driven and language understanding tasks, including NER, relation extraction, CommonsenseQA, OpenBookQA and GLUE. Experimental results demonstrate that our model can significantly improve typical PLMs: it gains a substantial improvement of 0.5%, 2.9%, 9.0%, 7.1% and 3.3% on BERT-large respectively, and is also effective on RoBERTa-large.

#8 Interpretable AMR-Based Question Decomposition for Multi-hop Question Answering [PDF] [Copy] [Kimi]

Authors: Zhenyun Deng ; Yonghua Zhu ; Yang Chen ; Michael Witbrock ; Patricia Riddle

Effective multi-hop question answering (QA) requires reasoning over multiple scattered paragraphs and providing explanations for answers. Most existing approaches cannot provide an interpretable reasoning process to illustrate how these models arrive at an answer. In this paper, we propose a Question Decomposition method based on Abstract Meaning Representation (QDAMR) for multi-hop QA, which achieves interpretable reasoning by decomposing a multi-hop question into simpler subquestions and answering them in order. Since annotating the decomposition is expensive, we first delegate the complexity of understanding the multi-hop question to an AMR parser. We then achieve decomposition of a multi-hop question via segmentation of the corresponding AMR graph based on the required reasoning type. Finally, we generate sub-questions using an AMR-to-Text generation model and answer them with an off-the-shelf QA model. Experimental results on HotpotQA demonstrate that our approach is competitive for interpretable reasoning and that the sub-questions generated by QDAMR are well-formed, outperforming existing question-decomposition-based multihop QA approaches.

#9 Interactive Information Extraction by Semantic Information Graph [PDF1] [Copy] [Kimi1]

Authors: Siqi Fan ; Yequan Wang ; Jing Li ; Zheng Zhang ; Shuo Shang ; Peng Han

Information extraction (IE) mainly focuses on three highly correlated subtasks, i.e., entity extraction, relation extraction and event extraction. Recently, there are studies using Abstract Meaning Representation (AMR) to utilize the intrinsic correlations among these three subtasks. AMR based models are capable of building the relationship of arguments. However, they are hard to deal with relations. In addition, the noises of AMR (i.e., tags unrelated to IE tasks, nodes with unconcerned conception, and edge types with complicated hierarchical structures) disturb the decoding processing of IE. As a result, the decoding processing limited by the AMR cannot be worked effectively. To overcome the shortages, we propose an Interactive Information Extraction (InterIE) model based on a novel Semantic Information Graph (SIG). SIG can guide our InterIE model to tackle the three subtasks jointly. Furthermore, the well-designed SIG without noise is capable of enriching entity and event trigger representation, and capturing the edge connection between the information types. Experimental results show that our InterIE achieves state-of-the-art performance on all IE subtasks on the benchmark dataset (i.e., ACE05-E+ and ACE05-E). More importantly, the proposed model is not sensitive to the decoding order, which goes beyond the limitations of AMR based methods.

#10 Global Inference with Explicit Syntactic and Discourse Structures for Dialogue-Level Relation Extraction [PDF] [Copy] [Kimi]

Authors: Hao Fei ; Jingye Li ; Shengqiong Wu ; Chenliang Li ; Donghong Ji ; Fei Li

Recent research attention for relation extraction has been paid to the dialogue scenario, i.e., dialogue-level relation extraction (DiaRE). Existing DiaRE methods either simply concatenate the utterances in a dialogue into a long piece of text, or employ naive words, sentences or entities to build dialogue graphs, while the structural characteristics in dialogues have not been fully utilized. In this work, we investigate a novel dialogue-level mixed dependency graph (D2G) and an argument reasoning graph (ARG) for DiaRE with a global relation reasoning mechanism. First, we model the entire dialogue into a unified and coherent D2G by explicitly integrating both syntactic and discourse structures, which enables richer semantic and feature learning for relation extraction. Second, we stack an ARG graph on top of D2G to further focus on argument inter-dependency learning and argument representation refinement, for sufficient argument relation inference. In our global reasoning framework, D2G and ARG work collaboratively, iteratively performing lexical, syntactic and semantic information exchange and representation learning over the entire dialogue context. On two DiaRE benchmarks, our framework shows considerable improvements over the current state-of-the-art baselines. Further analyses show that the model effectively solves the long-range dependence issue, and meanwhile gives explainable predictions.

#11 Conversational Semantic Role Labeling with Predicate-Oriented Latent Graph [PDF] [Copy] [Kimi1]

Authors: Hao Fei ; Shengqiong Wu ; Meishan Zhang ; Yafeng Ren ; Donghong Ji

Conversational semantic role labeling (CSRL) is a newly proposed task that uncovers the shallow semantic structures in a dialogue text. Unfortunately several important characteristics of the CSRL task have been overlooked by the existing works, such as the structural information integration, near-neighbor influence. In this work, we investigate the integration of a latent graph for CSRL. We propose to automatically induce a predicate-oriented latent graph (POLar) with a predicate-centered gaussian mechanism, by which the nearer and informative words to the predicate will be allocated with more attention. The POLar structure is then dynamically pruned and refined so as to best fit the task need. We additionally introduce an effective dialogue-level pre-trained language model, CoDiaBERT, for better supporting multiple utterance sentences and handling the speaker coreference issue in CSRL. Our system outperforms best-performing baselines on three benchmark CSRL datasets with big margins, especially achieving over 4% F1 score improvements on the cross-utterance argument detection. Further analyses are presented to better understand the effectiveness of our proposed methods.

#12 Inheriting the Wisdom of Predecessors: A Multiplex Cascade Framework for Unified Aspect-based Sentiment Analysis [PDF] [Copy] [Kimi]

Authors: Hao Fei ; Fei Li ; Chenliang Li ; Shengqiong Wu ; Jingye Li ; Donghong Ji

So far, aspect-based sentiment analysis (ABSA) has involved with total seven subtasks, in which, however the interactions among them have been left unexplored sufficiently. This work presents a novel multiplex cascade framework for unified ABSA and maintaining such interactions. First, we model total seven subtasks as a hierarchical dependency in the easy-to-hard order, based on which we then propose a multiplex decoding mechanism, transferring the sentiment layouts and clues in lower tasks to upper ones. The multiplex strategy enables highly-efficient subtask interflows and avoids repetitive training; meanwhile it sufficiently utilizes the existing data without requiring any further annotation. Further, based on the characteristics of aspect-opinion term extraction and pairing, we enhance our multiplex framework by integrating POS tag and syntactic dependency information for term boundary and pairing identification. The proposed Syntax-aware Multiplex (SyMux) framework enhances the ABSA performances on 28 subtasks (7×4 datasets) with big margins.

#13 Logically Consistent Adversarial Attacks for Soft Theorem Provers [PDF] [Copy] [Kimi]

Authors: Alexander Gaskell ; Yishu Miao ; Francesca Toni ; Lucia Specia

Recent efforts within the AI community have yielded impressive results towards “soft theorem proving” over natural language sentences using language models. We propose a novel, generative adversarial framework for probing and improving these models’ reasoning capabilities. Adversarial attacks in this domain suffer from the logical inconsistency problem, whereby perturbations to the input may alter the label. Our Logically consistent AdVersarial Attacker, LAVA, addresses this by combining a structured generative process with a symbolic solver, guaranteeing logical consistency. Our framework successfully generates adversarial attacks and identifies global weaknesses common across multiple target models. Our analyses reveal naive heuristics and vulnerabilities in these models’ reasoning capabilities, exposing an incomplete grasp of logical deduction under logic programs. Finally, in addition to effective probing of these models, we show that training on the generated samples improves the target model’s performance.

#14 Leveraging the Wikipedia Graph for Evaluating Word Embeddings [PDF] [Copy] [Kimi1]

Authors: Joachim Giesen ; Paul Kahlmeyer ; Frank Nussbaum ; Sina Zarrieß

Deep learning models for different NLP tasks often rely on pre-trained word embeddings, that is, vector representations of words. Therefore, it is crucial to evaluate pre-trained word embeddings independently of downstream tasks. Such evaluations try to assess whether the geometry induced by a word embedding captures connections made in natural language, such as, analogies, clustering of words, or word similarities. Here, traditionally, similarity is measured by comparison to human judgment. However, explicitly annotating word pairs with similarity scores by surveying humans is expensive. We tackle this problem by formulating a similarity measure that is based on an agent for routing the Wikipedia hyperlink graph. In this graph, word similarities are implicitly encoded by edges between articles. We show on the English Wikipedia that our measure correlates well with a large group of traditional similarity measures, while covering a much larger proportion of words and avoiding explicit human labeling. Moreover, since Wikipedia is available in more than 300 languages, our measure can easily be adapted to other languages, in contrast to traditional similarity measures.

#15 Fallacious Argument Classification in Political Debates [PDF] [Copy] [Kimi1]

Authors: Pierpaolo Goffredo ; Shohreh Haddadan ; Vorakit Vorakitphan ; Elena Cabrio ; Serena Villata

Fallacies play a prominent role in argumentation since antiquity due to their contribution to argumentation in critical thinking education. Their role is even more crucial nowadays as contemporary argumentation technologies face challenging tasks as misleading and manipulative information detection in news articles and political discourse, and counter-narrative generation. Despite some work in this direction, the issue of classifying arguments as being fallacious largely remains a challenging and an unsolved task. Our contribution is twofold: first, we present a novel annotated resource of 31 political debates from the U.S. Presidential Campaigns, where we annotated six main categories of fallacious arguments (i.e., ad hominem, appeal to authority, appeal to emotion, false cause, slogan, slippery slope) leading to 1628 annotated fallacious arguments; second, we tackle this novel task of fallacious argument classification and we define a neural architecture based on transformers outperforming state-of-the-art results and standard baselines. Our results show the important role played by argument components and relations in this task.

#16 Improving Few-Shot Text-to-SQL with Meta Self-Training via Column Specificity [PDF1] [Copy] [Kimi1]

Authors: Xinnan Guo ; Yongrui Chen ; Guilin Qi ; Tianxing Wu ; Hao Xu

The few-shot problem is an urgent challenge for single-table text-to-SQL. Existing methods ignore the potential value of unlabeled data, and merely rely on a coarse-grained Meta-Learning (ML) algorithm that neglects the differences of column contributions to the optimization object. This paper proposes a Meta Self-Training text-to-SQL (MST-SQL) method to solve the problem. Specifically, MST-SQL is based on column-wise HydraNet and adopts self-training as an effective mechanism to learn from readily available unlabeled samples. During each epoch of training, it first predicts pseudo-labels for unlabeled samples and then leverages them to update the parameters. A fine-grained ML algorithm is used in updating, which weighs the contribution of columns by their specificity, in order to further improve the generalizability. Extensive experimental results on both open-domain and domain-specific benchmarks reveal that our MST-SQL has significant advantages in few-shot scenarios, and is also competitive in standard supervised settings.

#17 FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis [PDF] [Copy] [Kimi]

Authors: Rongjie Huang ; Max W. Y. Lam ; Jun Wang ; Dan Su ; Dong Yu ; Yi Ren ; Zhou Zhao

Denoising diffusion probabilistic models (DDPMs) have recently achieved leading performances in many generative tasks. However, the inherited iterative sampling process costs hindered their applications to speech synthesis. This paper proposes FastDiff, a fast conditional diffusion model for high-quality speech synthesis. FastDiff employs a stack of time-aware location-variable convolutions of diverse receptive field patterns to efficiently model long-term time dependencies with adaptive conditions. A noise schedule predictor is also adopted to reduce the sampling steps without sacrificing the generation quality. Based on FastDiff, we design an end-to-end text-to-speech synthesizer, FastDiff-TTS, which generates high-fidelity speech waveforms without any intermediate feature (e.g., Mel-spectrogram). Our evaluation of FastDiff demonstrates the state-of-the-art results with higher-quality (MOS 4.28) speech samples. Also, FastDiff enables a sampling speed of 58x faster than real-time on a V100 GPU, making diffusion models practically applicable to speech synthesis deployment for the first time. We further show that FastDiff generalized well to the mel-spectrogram inversion of unseen speakers, and FastDiff-TTS outperformed other competing methods in end-to-end text-to-speech synthesis. Audio samples are available at https://FastDiff.github.io/.

#18 MuiDial: Improving Dialogue Disentanglement with Intent-Based Mutual Learning [PDF] [Copy] [Kimi]

Authors: Ziyou Jiang ; Lin Shi ; Celia Chen ; Fangwen Mu ; Yumin Zhang ; Qing Wang

The main goal of dialogue disentanglement is to separate the mixed utterances from a chat slice into independent dialogues. Existing models often utilize either an utterance-to-utterance (U2U) prediction to determine whether two utterances that have the “reply-to” relationship belong to one dialogue, or an utterance-to-thread (U2T) prediction to determine which dialogue-thread a given utterance should belong to. Inspired by mutual leaning, we propose MuiDial, a novel dialogue disentanglement model, to exploit the intent of each utterance and feed the intent to a mutual learning U2U-U2T disentanglement model. Experimental results and in-depth analysis on several benchmark datasets demonstrate the effectiveness and generalizability of our approach.

#19 AdMix: A Mixed Sample Data Augmentation Method for Neural Machine Translation [PDF] [Copy] [Kimi]

Authors: Chang Jin ; Shigui Qiu ; Nini Xiao ; Hao Jia

In Neural Machine Translation (NMT), data augmentation methods such as back-translation have proven their effectiveness in improving translation performance. In this paper, we propose a novel data augmentation approach for NMT, which is independent of any additional training data. Our approach, AdMix, consists of two parts: 1) introduce faint discrete noise (word replacement, word dropping, word swapping) into the original sentence pairs to form augmented samples; 2) generate new synthetic training data by softly mixing the augmented samples with their original samples in training corpus. Experiments on three translation datasets of different scales show that AdMix achieves significant improvements (1.0 to 2.7 BLEU points) over strong Transformer baseline. When combined with other data augmentation techniques (e.g., back-translation), our approach can obtain further improvements.

#20 Curriculum-Based Self-Training Makes Better Few-Shot Learners for Data-to-Text Generation [PDF] [Copy] [Kimi]

Authors: Pei Ke ; Haozhe Ji ; Zhenyu Yang ; Yi Huang ; Junlan Feng ; Xiaoyan Zhu ; Minlie Huang

Despite the success of text-to-text pre-trained models in various natural language generation (NLG) tasks, the generation performance is largely restricted by the number of labeled data in downstream tasks, particularly in data-to-text generation tasks. Existing works mostly utilize abundant unlabeled structured data to conduct unsupervised pre-training for task adaption, which fail to model the complex relationship between source structured data and target texts. Thus, we introduce self-training as a better few-shot learner than task-adaptive pre-training, which explicitly captures this relationship via pseudo-labeled data generated by the pre-trained model. To alleviate the side-effect of low-quality pseudo-labeled data during self-training, we propose a novel method called Curriculum-Based Self-Training (CBST) to effectively leverage unlabeled data in a rearranged order determined by the difficulty of text generation. Experimental results show that our method can outperform fine-tuning and task-adaptive pre-training methods, and achieve state-of-the-art performance in the few-shot setting of data-to-text generation.

#21 Deexaggeration [PDF] [Copy] [Kimi]

Authors: Li Kong ; Chuanyi Li ; Vincent Ng

We introduce a new task in hyperbole processing, deexaggeration, which concerns the recovery of the meaning of what is being exaggerated in a hyperbolic sentence in the form of a structured representation. In this paper, we lay the groundwork for the computational study of understanding hyperbole by (1) defining a structured representation to encode what is being exaggerated in a hyperbole in a non-hyperbolic manner, (2) annotating the hyperbolic sentences in two existing datasets, HYPO and HYPO-cn, using this structured representation, (3) conducting an empirical analysis of our annotated corpora, and (4) presenting preliminary results on the deexaggeration task.

#22 Taylor, Can You Hear Me Now? A Taylor-Unfolding Framework for Monaural Speech Enhancement [PDF] [Copy] [Kimi]

Authors: Andong Li ; Shan You ; Guochen Yu ; Chengshi Zheng ; Xiaodong Li

While the deep learning techniques promote the rapid development of the speech enhancement (SE) community, most schemes only pursue the performance in a black-box manner and lack adequate model interpretability. Inspired by Taylor's approximation theory, we propose an interpretable decoupling-style SE framework, which disentangles the complex spectrum recovery into two separate optimization problems i.e., magnitude and complex residual estimation. Specifically, serving as the 0th-order term in Taylor's series, a filter network is delicately devised to suppress the noise component only in the magnitude domain and obtain a coarse spectrum. To refine the phase distribution, we estimate the sparse complex residual, which is defined as the difference between target and coarse spectra, and measures the phase gap. In this study, we formulate the residual component as the combination of various high-order Taylor terms and propose a lightweight trainable module to replace the complicated derivative operator between adjacent terms. Finally, following Taylor's formula, we can reconstruct the target spectrum by the superimposition between 0th-order and high-order terms. Experimental results on two benchmark datasets show that our framework achieves state-of-the-art performance over previous competing baselines in various evaluation metrics. The source code is available at https://github.com/Andong-Li-speech/TaylorSENet.

#23 FastRE: Towards Fast Relation Extraction with Convolutional Encoder and Improved Cascade Binary Tagging Framework [PDF] [Copy] [Kimi]

Authors: Guozheng Li ; Xu Chen ; Peng Wang ; Jiafeng Xie ; Qiqing Luo

Recent work for extracting relations from texts has achieved excellent performance. However, most existing methods pay less attention to the efficiency, making it still challenging to quickly extract relations from massive or streaming text data in realistic scenarios. The main efficiency bottleneck is that these methods use a Transformer-based pre-trained language model for encoding, which heavily affects the training speed and inference speed. To address this issue, we propose a fast relation extraction model (FastRE) based on convolutional encoder and improved cascade binary tagging framework. Compared to previous work, FastRE employs several innovations to improve efficiency while also keeping promising performance. Concretely, FastRE adopts a novel convolutional encoder architecture combined with dilated convolution, gated unit and residual connection, which significantly reduces the computation cost of training and inference, while maintaining the satisfactory performance. Moreover, to improve the cascade binary tagging framework, FastRE first introduces a type-relation mapping mechanism to accelerate tagging efficiency and alleviate relation redundancy, and then utilizes a position-dependent adaptive thresholding strategy to obtain higher tagging accuracy and better model generalization. Experimental results demonstrate that FastRE is well balanced between efficiency and performance, and achieves 3-10$\times$ training speed, 7-15$\times$ inference speed faster, and 1/100 parameters compared to the state-of-the-art models, while the performance is still competitive. Our code is available at \url{https://github.com/seukgcode/FastRE}.

#24 Neutral Utterances are Also Causes: Enhancing Conversational Causal Emotion Entailment with Social Commonsense Knowledge [PDF] [Copy] [Kimi]

Authors: Jiangnan Li ; Fandong Meng ; Zheng Lin ; Rui Liu ; Peng Fu ; Yanan Cao ; Weiping Wang ; Jie Zhou

Conversational Causal Emotion Entailment aims to detect causal utterances for a non-neutral targeted utterance from a conversation. In this work, we build conversations as graphs to overcome implicit contextual modelling of the original entailment style. Following the previous work, we further introduce the emotion information into graphs. Emotion information can markedly promote the detection of causal utterances whose emotion is the same as the targeted utterance. However, it is still hard to detect causal utterances with different emotions, especially neutral ones. The reason is that models are limited in reasoning causal clues and passing them between utterances. To alleviate this problem, we introduce social commonsense knowledge (CSK) and propose a Knowledge Enhanced Conversation graph (KEC). KEC propagates the CSK between two utterances. As not all CSK is emotionally suitable for utterances, we therefore propose a sentiment-realized knowledge selecting strategy to filter CSK. To process KEC, we further construct the Knowledge Enhanced Directed Acyclic Graph networks. Experimental results show that our method outperforms baselines and infers more causes with different emotions from the targeted utterance.

#25 Domain-Adaptive Text Classification with Structured Knowledge from Unlabeled Data [PDF] [Copy] [Kimi]

Authors: Tian Li ; Xiang Chen ; Zhen Dong ; Kurt Keutzer ; Shanghang Zhang

Domain adaptive text classification is a challenging problem for the large-scale pretrained language models because they often require expensive additional labeled data to adapt to new domains. Existing works usually fails to leverage the implicit relationships among words across domains. In this paper, we propose a novel method, called Domain Adaptation with Structured Knowledge (DASK), to enhance domain adaptation by exploiting word-level semantic relationships. DASK first builds a knowledge graph to capture the relationship between pivot terms (domain-independent words) and non-pivot terms in the target domain. Then during training, DASK injects pivot-related knowledge graph information into source domain texts. For the downstream task, these knowledge-injected texts are fed into a BERT variant capable of processing knowledge-injected textual data. Thanks to the knowledge injection, our model learns domain-invariant features for non-pivots according to their relationships with pivots. DASK ensures the pivots to have domain-invariant behaviors by dynamically inferring via the polarity scores of candidate pivots during training with pseudo-labels. We validate DASK on a wide range of cross-domain sentiment classification tasks and observe up to 2.9% absolute performance improvement over baselines for 20 different domain pairs. Code is available at https://github.com/hikaru-nara/DASK.