IJCAI.2023 - Natural Language Processing | Cool Papers

#1 Keep Skills in Mind: Understanding and Implementing Skills in Commonsense Question Answering [PDF²] [Copy] [Kimi²] [REL]

Authors: Meikai Bao, Qi Liu, Kai Zhang, Ye Liu, Linan Yue, Longfei Li, Jun Zhou

Commonsense Question Answering (CQA) aims to answer questions that require human commonsense. Closed-book CQA, as one of the subtasks, requires the model to answer questions without retrieving external knowledge, which emphasizes the importance of the model's problem-solving ability. Most previous methods relied on large-scale pre-trained models to generate question-related knowledge while ignoring the crucial role of skills in the process of answering commonsense questions. Generally, skills refer to the learned ability in performing a specific task or activity, which are derived from knowledge and experience. In this paper, we introduce a new approach named Dynamic Skill-aware Commonsense Question Answering (DSCQA), which transcends the limitations of traditional methods by informing the model about the need for each skill in questions and utilizes skills as a critical driver in CQA process. To be specific, DSCQA first employs commonsense skill extraction module to generate various skill representations. Then, DSCQA utilizes dynamic skill module to generate dynamic skill representations. Finally, in perception and emphasis module, various skills and dynamic skill representations are used to help question-answering process. Experimental results on two publicly available CQA datasets show the effectiveness of our proposed model and the considerable impact of introducing skills.

Subject: IJCAI.2023 - Natural Language Processing

#2 An Effective and Efficient Time-aware Entity Alignment Framework via Two-aspect Three-view Label Propagation [PDF¹] [Copy] [Kimi¹] [REL]

Authors: Li Cai, Xin Mao, Youshao Xiao, Changxu Wu, Man Lan

Entity alignment (EA) aims to find the equivalent entity pairs between different knowledge graphs (KGs), which is crucial to promote knowledge fusion. With the wide use of temporal knowledge graphs (TKGs), time-aware EA (TEA) methods appear to enhance EA. Existing TEA models are based on Graph Neural Networks (GNN) and achieve state-of-the-art (SOTA) performance, but it is difficult to transfer them to large-scale TKGs due to the scalability issue of GNN. In this paper, we propose an effective and efficient non-neural EA framework between TKGs, namely LightTEA, which consists of four essential components: (1) Two-aspect Three-view Label Propagation, (2) Sparse Similarity with Temporal Constraints, (3) Sinkhorn Operator, and (4) Temporal Iterative Learning. All of these modules work together to improve the performance of EA while reducing the time consumption of the model. Extensive experiments on public datasets indicate that our proposed model significantly outperforms the SOTA methods for EA between TKGs, and the time consumed by LightTEA is only dozens of seconds at most, no more than 10% of the most efficient TEA method.

Subject: IJCAI.2023 - Natural Language Processing

#3 One Model for All Domains: Collaborative Domain-Prefix Tuning for Cross-Domain NER [PDF¹] [Copy] [Kimi¹] [REL]

Authors: Xiang Chen, Lei Li, Shuofei Qiao, Ningyu Zhang, Chuanqi Tan, Yong Jiang, Fei Huang, Huajun Chen

Cross-domain NER is a challenging task to address the low-resource problem in practical scenarios. Previous typical solutions mainly obtain a NER model by pre-trained language models (PLMs) with data from a rich-resource domain and adapt it to the target domain. Owing to the mismatch issue among entity types in different domains, previous approaches normally tune all parameters of PLMs, ending up with an entirely new NER model for each domain. Moreover, current models only focus on leveraging knowledge in one general source domain while failing to successfully transfer knowledge from multiple sources to the target. To address these issues, we introduce Collaborative Domain-Prefix Tuning for cross-domain NER (CP-NER) based on text-to-text generative PLMs. Specifically, we present text-to-text generation grounding domain-related instructors to transfer knowledge to new domain NER tasks without structural modifications. We utilize frozen PLMs and conduct collaborative domain-prefix tuning to stimulate the potential of PLMs to handle NER tasks across various domains. Experimental results on the Cross-NER benchmark show that the proposed approach has flexible transfer ability and performs better on both one-source and multiple-source cross-domain NER tasks.

Subject: IJCAI.2023 - Natural Language Processing

#4 Less Learn Shortcut: Analyzing and Mitigating Learning of Spurious Feature-Label Correlation [PDF] [Copy] [Kimi] [REL]

Authors: Yanrui Du, Jing Yan, Yan Chen, Jing Liu, Sendong Zhao, Qiaoqiao She, Hua Wu, Haifeng Wang, Bing Qin

Recent research has revealed that deep neural networks often take dataset biases as a shortcut to make decisions rather than understand tasks, leading to failures in real-world applications. In this study, we focus on the spurious correlation between word features and labels that models learn from the biased data distribution of training data. In particular, we define the word highly co-occurring with a specific label as biased word, and the example containing biased word as biased example. Our analysis shows that biased examples are easier for models to learn, while at the time of prediction, biased words make a significantly higher contribution to the models' predictions, and models tend to assign predicted labels over-relying on the spurious correlation between words and labels. To mitigate models' over-reliance on the shortcut (i.e. spurious correlation), we propose a training strategy Less-Learn-Shortcut (LLS): our strategy quantifies the biased degree of the biased examples and down-weights them accordingly. Experimental results on Question Matching, Natural Language Inference and Sentiment Analysis tasks show that LLS is a task-agnostic strategy and can improve the model performance on adversarial data while maintaining good performance on in-domain data.

Subject: IJCAI.2023 - Natural Language Processing

#5 KEST: Kernel Distance Based Efficient Self-Training for Improving Controllable Text Generation [PDF] [Copy] [Kimi] [REL]

Authors: Yuxi Feng, Xiaoyuan Yi, Laks V.S. Lakshmanan, Xing Xie

Self-training (ST) has come to fruition in language understanding tasks by producing pseudo labels, which reduces the labeling bottleneck of language model fine-tuning. Nevertheless, in facilitating semi-supervised controllable language generation, ST faces two key challenges. First, augmented by self-generated pseudo text, generation models tend to over-exploit the previously learned text distribution, suffering from mode collapse and poor generation diversity. Second, generating pseudo text in each iteration is time-consuming, severely decelerating the training process. In this work, we propose KEST, a novel and efficient self-training framework to handle these problems. KEST utilizes a kernel-based loss, rather than standard cross entropy, to learn from the soft pseudo text produced by a shared non-autoregressive generator. We demonstrate both theoretically and empirically that KEST can benefit from more diverse pseudo text in an efficient manner, which allows not only refining and exploiting the previously fitted distribution but also enhanced exploration towards a larger potential text space, providing a guarantee of improved performance. Experiments on three controllable generation tasks demonstrate that KEST significantly improves control accuracy while maintaining comparable text fluency and generation diversity against several strong baselines.

Subject: IJCAI.2023 - Natural Language Processing

#6 Regularisation for Efficient Softmax Parameter Generation in Low-Resource Text Classifiers [PDF] [Copy] [Kimi] [REL]

Authors: Daniel Grießhaber, Johannes Maucher, Ngoc Thang Vu

Meta-learning has made tremendous progress in recent years and was demonstrated to be particularly suitable in low-resource settings where training data is very limited. However, meta-learning models still require large amounts of training tasks to achieve good generalisation. Since labelled training data may be sparse, self-supervision-based approaches are able to further improve performance on downstream tasks. Although no labelled data is necessary for this training, a large corpus of unlabelled text needs to be available. In this paper, we improve on recent advances in meta-learning for natural language models that allow training on a diverse set of training tasks for few-shot, low-resource target tasks. We introduce a way to generate new training data with the need for neither more supervised nor unsupervised datasets. We evaluate the method on a diverse set of NLP tasks and show that the model decreases in performance when trained on this data without further adjustments. Therefore, we introduce and evaluate two methods for regularising the training process and show that they not only improve performance when used in conjunction with the new training data but also improve average performance when training only on the original data, compared to the baseline.

Subject: IJCAI.2023 - Natural Language Processing

#7 SmartBERT: A Promotion of Dynamic Early Exiting Mechanism for Accelerating BERT Inference [PDF] [Copy] [Kimi¹] [REL]

Authors: Boren Hu, Yun Zhu, Jiacheng Li, Siliang Tang

Dynamic early exiting has been proven to improve the inference speed of the pre-trained language model like BERT. However, all samples must go through all consecutive layers before early exiting and more complex samples usually go through more layers, which still exists redundant computation. In this paper, we propose a novel dynamic early exiting combined with layer skipping for BERT inference named SmartBERT, which adds a skipping gate and an exiting operator into each layer of BERT. SmartBERT can adaptively skip some layers and adaptively choose whether to exit. Besides, we propose cross-layer contrastive learning and combine it into our training phases to boost the intermediate layers and classifiers which would be beneficial for early exiting. To keep the inconsistent usage of skipping gates between training and inference phases, we propose a hard weight mechanism during training phase. We conduct experiments on eight classification datasets of the GLUE benchmark. Experimental results show that SmartBERT achieves 2-3× computation reduction with minimal accuracy drops compared with BERT and our method outperforms previous methods in both efficiency and accuracy. Moreover, in some complex datasets, we prove that the early exiting based on entropy hardly works, and the skipping mechanism is essential for reducing computation.

Subject: IJCAI.2023 - Natural Language Processing

#8 Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition [PDF¹] [Copy] [Kimi¹] [REL]

Authors: Yuchen Hu, Ruizhe Li, Chen Chen, Heqing Zou, Qiushi Zhu, Eng Siong Chng

Audio-visual speech recognition (AVSR) research has gained a great success recently by improving the noise-robustness of audio-only automatic speech recognition (ASR) with noise-invariant visual information. However, most existing AVSR approaches simply fuse the audio and visual features by concatenation, without explicit interactions to capture the deep correlations between them, which results in sub-optimal multimodal representations for downstream speech recognition task. In this paper, we propose a cross-modal global interaction and local alignment (GILA) approach for AVSR, which captures the deep audio-visual (A-V) correlations from both global and local perspectives. Specifically, we design a global interaction model to capture the A-V complementary relationship on modality level, as well as a local alignment approach to model the A-V temporal consistency on frame level. Such a holistic view of cross-modal correlations enable better multimodal representations for AVSR. Experiments on public benchmarks LRS3 and LRS2 show that our GILA outperforms the supervised learning state-of-the-art. Code is at https://github.com/YUCHEN005/GILA.

Subject: IJCAI.2023 - Natural Language Processing

#9 Explainable Text Classification via Attentive and Targeted Mixing Data Augmentation [PDF²] [Copy] [Kimi¹] [REL]

Authors: Songhao Jiang, Yan Chu, Zhengkui Wang, Tianxing Ma, Hanlin Wang, Wenxuan Lu, Tianning Zang, Bo Wang

Mixing data augmentation methods have been widely used in text classification recently. However, existing methods do not control the quality of augmented data and have low model explainability. To tackle these issues, this paper proposes an explainable text classification solution based on attentive and targeted mixing data augmentation, ATMIX. Instead of selecting data for augmentation without control, ATMIX focuses on the misclassified training samples as the target for augmentation to better improve the model's capability. Meanwhile, to generate meaningful augmented samples, it adopts a self-attention mechanism to understand the importance of the subsentences in a text, and cut and mix the subsentences between the misclassified and correctly classified samples wisely. Furthermore, it employs a novel dynamic augmented data selection framework based on the loss function gradient to dynamically optimize the augmented samples for model training. In the end, we develop a new model explainability evaluation method based on subsentence attention and conduct extensive evaluations over multiple real-world text datasets. The results indicate that ATMIX is more effective with higher explainability than the typical classification models, hidden-level, and input-level mixup models.

Subject: IJCAI.2023 - Natural Language Processing

#10 ScriptWorld: Text Based Environment for Learning Procedural Knowledge [PDF] [Copy] [Kimi] [REL]

Authors: Abhinav Joshi, Areeb Ahmad, Umang Pandey, Ashutosh Modi

Text-based games provide a framework for developing natural language understanding and commonsense knowledge about the world in reinforcement learning based agents. Existing text-based environments often rely on fictional situations and characters to create a gaming framework and are far from real-world scenarios. In this paper, we introduce ScriptWorld: a text-based environment for teaching agents about real-world daily chores and hence imparting commonsense knowledge. To the best of our knowledge, it is the first interactive text-based gaming framework that consists of daily real-world human activities designed using scripts dataset. We provide gaming environments for 10 daily activities and perform a detailed analysis of the proposed environment. We develop RL-based baseline models/agents to play the games in ScriptWorld. To understand the role of language models in such environments, we leverage features obtained from pre-trained language models in the RL agents. Our experiments show that prior knowledge obtained from a pre-trained language model helps to solve real-world text-based gaming environments.

Subject: IJCAI.2023 - Natural Language Processing

#11 Towards Incremental NER Data Augmentation via Syntactic-aware Insertion Transformer [PDF³] [Copy] [Kimi¹] [REL]

Authors: Wenjun Ke, Zongkai Tian, Qi Liu, Peng Wang, Jinhua Gao, Rui Qi

Named entity recognition (NER) aims to locate and classify named entities in natural language texts. Most existing high-performance NER models employ a supervised paradigm, which requires a large quantity of high-quality annotated data during training. In order to help NER models perform well in few-shot scenarios, data augmentation approaches attempt to build extra data by means of random editing or by using end-to-end generation with PLMs. However, these methods focus on only the fluency of generated sentences, ignoring the syntactic correlation between the new and raw sentences. Such uncorrelation also brings low diversity and inconsistent labeling of synthetic samples. To fill this gap, we present SAINT (Syntactic-Aware InsertioN Transformer), a hard-constraint controlled text generation model that incorporates syntactic information. The proposed method operates by inserting new tokens between existing entities in a parallel manner. During insertion procedure, new tokens will be added taking both semantic and syntactic factors into account. Hence the resulting sentence can retain the syntactic correctness with respect to the raw data. Experimental results on two benchmark datasets, i.e., Ontonotes and Wikiann, demonstrate the comparable performance of SAINT over the state-of-the-art baselines.

Subject: IJCAI.2023 - Natural Language Processing

#12 Towards Lossless Head Pruning through Automatic Peer Distillation for Language Models [PDF] [Copy] [Kimi] [REL]

Authors: Bingbing Li, Zigeng Wang, Shaoyi Huang, Mikhail Bragin, Ji Li, Caiwen Ding

Pruning has been extensively studied in Transformer-based language models to improve efficiency. Typically, we zero (prune) unimportant model weights and train a derived compact model to improve final accuracy. For pruned weights, we treat them as useless and discard them. This usually leads to significant model accuracy degradation. In this paper, we focus on attention head pruning as head attention is a key component of the transformer-based language models and provides interpretable knowledge meaning. We reveal the relationship between pruned attention heads and retained heads and provide a solution to recycle the discarded knowledge from the pruned heads, named peer distillation. We also develop an automatic framework to locate the to-be-pruned attention heads in each layer, freeing the time-consuming human labor in tuning hyperparameters.Experimental results on the General Language Understanding Evaluation (GLUE) benchmark are provided using BERT model. By recycling discarded knowledge from pruned heads, the proposed method maintains model performance across all nine tasks while reducing heads by over 58% on average and outperforms state-of-the-art techniques (e.g., Random, HISP, L0 Norm, SMP).

Subject: IJCAI.2023 - Natural Language Processing

#13 Annealing Genetic-based Preposition Substitution for Text Rubbish Example Generation [PDF] [Copy] [Kimi] [REL]

Authors: Chen Li, Xinghao Yang, Baodi Liu, Weifeng Liu, Honglong Chen

Modern Natural Language Processing (NLP) models expose under-sensitivity towards text rubbish examples. The text rubbish example is the heavily modified input text which is nonsensical to humans but does not change the model’s prediction. Prior work crafts rubbish examples by iteratively deleting words and determining the deletion order with beam search. However, the produced rubbish examples usually cause a reduction in model confidence and sometimes deliver human-readable text. To address these problems, we propose an Annealing Genetic based Preposition Substitution (AGPS) algorithm for text rubbish sample generation with two major merits. Firstly, the AGPS crafts rubbish text examples by substituting input words with meaningless prepositions instead of directly removing them, which brings less degradation to the model’s confidence. Secondly, we design an Annealing Genetic algorithm to optimize the word replacement priority, which allows the Genetic Algorithm (GA) to jump out the local optima with probabilities. This is significant in achieving better objectives, i.e., a high word modification rate and a high model confidence. Experimental results on five popular datasets manifest the superiority of AGPS compared with the baseline and expose the fact: the NLP models can not really understand the semantics of sentences, as they give the same prediction with even higher confidence for the nonsensical preposition sequences.

Subject: IJCAI.2023 - Natural Language Processing

#14 iRe2f: Rethinking Effective Refinement in Language Structure Prediction via Efficient Iterative Retrospecting and Reasoning [PDF] [Copy] [Kimi] [REL]

Authors: Zuchao Li, Xingyi Guo, Letian Peng, Lefei Zhang, Hai Zhao

Refinement plays a critical role in language structure prediction, a process that deals with complex situations such as structural edge interdependencies. Since language structure prediction usually modeled as graph parsing, typical refinement methods involve taking an initial parsing graph as input and refining it using language input and other relevant information. Intuitively, a refinement component, i.e., refiner, should be lightweight and efficient, as it is only responsible for correcting faults in the initial graph. However, current refiners add a significant burden to the parsing process due to their reliance on time-consuming encoding-decoding procedure on the language input and graph. To make the refiner more practical for real-world applications, this paper proposes a lightweight but effective iterative refinement framework, iRe^2f, based on iterative retrospecting and reasoning without involving the re-encoding process on the graph. iRe^2f iteratively refine the parsing graph based on interaction between graph and sequence and efficiently learns the shortcut to update the sequence and graph representations in each iteration. The shortcut is calculated based on the graph representation in the latest iteration. iRe^2f reduces the number of refinement parameters by 90% compared to the previous smallest refiner. Experiments on a variety of language structure prediction tasks show that iRe^2f performs comparably or better than current state-of-the-art refiners, with a significant increase in efficiency.

Subject: IJCAI.2023 - Natural Language Processing

#15 Local and Global: Temporal Question Answering via Information Fusion [PDF¹] [Copy] [Kimi¹] [REL]

Authors: Yonghao Liu, Di Liang, Mengyu Li, Fausto Giunchiglia, Ximing Li, Sirui Wang, Wei Wu, Lan Huang, Xiaoyue Feng, Renchu Guan

Many models that leverage knowledge graphs (KGs) have recently demonstrated remarkable success in question answering (QA) tasks. In the real world, many facts contained in KGs are time-constrained thus temporal KGQA has received increasing attention. Despite the fruitful efforts of previous models in temporal KGQA, they still have several limitations. (I) They neither emphasize the graph structural information between entities in KGs nor explicitly utilize a multi-hop relation path through graph neural networks to enhance answer prediction. (II) They adopt pre-trained language models (LMs) to obtain question representations, focusing merely on the global information related to the question while not highlighting the local information of the entities in KGs. To address these limitations, we introduce a novel model that simultaneously explores both Local information and Global information for the task of temporal KGQA (LGQA). Specifically, we first introduce an auxiliary task in the temporal KG embedding procedure to make timestamp embeddings time-order aware. Then, we design information fusion layers that effectively incorporate local and global information to deepen question understanding. We conduct extensive experiments on two benchmarks, and LGQA significantly outperforms previous state-of-the-art models, especially in difficult questions. Moreover, LGQA can generate interpretable and trustworthy predictions.

Subject: IJCAI.2023 - Natural Language Processing

#16 PPAT: Progressive Graph Pairwise Attention Network for Event Causality Identification [PDF] [Copy] [Kimi¹] [REL]

Authors: Zhenyu Liu, Baotian Hu, Zhenran Xu, Min Zhang

Event Causality Identification (ECI) aims to identify the causality between a pair of event mentions in a document, which is composed of sentence-level ECI (SECI) and document-level ECI (DECI). Previous work applies various reasoning models to identify the implicit event causality. However, they indiscriminately reason all event causality in the same way, ignoring that most inter-sentence event causality depends on intra-sentence event causality to infer. In this paper, we propose a progressive graph pairwise attention network (PPAT) to consider the above dependence. PPAT applies a progressive reasoning strategy, as it first predicts the intra-sentence event causality, and then infers the more implicit inter-sentence event causality based on the SECI result. We construct a sentence boundary event relational graph, and PPAT leverages a simple pairwise attention mechanism, which attends to different reasoning chains on the graph. In addition, we propose a causality-guided training strategy for assisting PPAT in learning causality-related representations on every layer. Extensive experiments show that our model achieves state-of-the-art performance on three benchmark datasets (5.5%, 2.2% and 4.5% F1 gains on EventStoryLine, MAVEN-ERE and Causal-TimeBank). Code is available at https://github.com/HITsz-TMG/PPAT.

Subject: IJCAI.2023 - Natural Language Processing

#17 Meta-Tsallis-Entropy Minimization: A New Self-Training Approach for Domain Adaptation on Text Classification [PDF¹] [Copy] [Kimi] [REL]

Authors: Menglong Lu, Zhen Huang, Zhiliang Tian, Yunxiang Zhao, Xuanyu Fei, Dongsheng Li

Text classification is a fundamental task for natural language processing, and adapting text classification models across domains has broad applications. Self-training generates pseudo-examples from the model's predictions and iteratively trains on the pseudo-examples, i.e., minimizes the loss on the source domain and the Gibbs entropy on the target domain. However, Gibbs entropy is sensitive to prediction errors, and thus, self-training tends to fail when the domain shift is large. In this paper, we propose Meta-Tsallis Entropy minimization (MTEM). MTEM uses an instance adaptive Tsallis entropy to replace the Gibbs entropy and a meta-learning algorithm to optimize the instance adaptive Tsallis entropy on the target domain. To reduce the computation cost of MTEM, we propose an approximation technique to approximate the second-order derivation involved in the meta-learning. To efficiently generate pseudo labels, we propose an annealing sampling mechanism for exploring the model's prediction probability. Theoretically, we prove the convergence of the meta-learning algorithm in MTEM and analyze the effectiveness of MTEM in achieving domain adaptation. Experimentally, MTEM improves the adaptation performance of BERT with an average of 4 percent on the benchmark dataset.

Subject: IJCAI.2023 - Natural Language Processing

#18 ODEE: A One-Stage Object Detection Framework for Overlapping and Nested Event Extraction [PDF] [Copy] [Kimi] [REL]

Authors: Jinzhong Ning, Zhihao Yang, Zhizheng Wang, Yuanyuan Sun, Hongfei Lin

The task of extracting overlapping and nested events has received significant attention in recent times, as prior research has primarily focused on extracting flat events, overlooking the intricacies of overlapping and nested occurrences. In this work, we present a new approach to Event Extraction (EE) by reformulating it as an object detection task on a table of token pairs. Our proposed one-stage event extractor, called ODEE, can handle overlapping and nested events. The model is designed with a vertex-based tagging scheme and two auxiliary tasks of predicting the spans and types of event trigger words and argument entities, leveraging the full span information of event elements. Furthermore, in the training stage, we introduce a negative sampling method for table cells to address the imbalance problem of positive and negative table cell tags, meanwhile improving computational efficiency. Empirical evaluations demonstrate that ODEE achieves the state-of-the-art performance on three benchmarks for overlapping and nested EE (i.e., FewFC, Genia11, and Genia13). Furthermore, ODEE outperforms current state-of-the-art methods in terms of both number of parameters and inference speed, indicating its high computational efficiency. To facilitate future research in this area, the codes are publicly available at https://github.com/NingJinzhong/ODEE.

Subject: IJCAI.2023 - Natural Language Processing

#19 Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining [PDF] [Copy] [Kimi] [REL]

Authors: Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, Hiroshi Saruwatari

While neural text-to-speech (TTS) has achieved human-like natural synthetic speech, multilingual TTS systems are limited to resource-rich languages due to the need for paired text and studio-quality audio data. This paper proposes a method for zero-shot multilingual TTS using text-only data for the target language. The use of text-only data allows the development of TTS systems for low-resource languages for which only textual resources are available, making TTS accessible to thousands of languages. Inspired by the strong cross-lingual transferability of multilingual language models, our framework first performs masked language model pretraining with multilingual text-only data. Then we train this model with a paired data in a supervised manner, while freezing a language-aware embedding layer. This allows inference even for languages not included in the paired data but present in the text-only data. Evaluation results demonstrate highly intelligible zero-shot TTS with a character error rate of less than 12% for an unseen language.

Subject: IJCAI.2023 - Natural Language Processing

#20 Case-Based Reasoning with Language Models for Classification of Logical Fallacies [PDF] [Copy] [Kimi] [REL]

Authors: Zhivar Sourati, Filip Ilievski, Hông-Ân Sandlin, Alain Mermoud

The ease and speed of spreading misinformation and propaganda on the Web motivate the need to develop trustworthy technology for detecting fallacies in natural language arguments. However, state-of-the-art language modeling methods exhibit a lack of robustness on tasks like logical fallacy classification that require complex reasoning. In this paper, we propose a Case-Based Reasoning method that classifies new cases of logical fallacy by language-modeling-driven retrieval and adaptation of historical cases. We design four complementary strategies to enrich input representation for our model, based on external information about goals, explanations, counterarguments, and argument structure. Our experiments in in-domain and out-of-domain settings indicate that Case-Based Reasoning improves the accuracy and generalizability of language models. Our ablation studies suggest that representations of similar cases have a strong impact on the model performance, that models perform well with fewer retrieved cases, and that the size of the case database has a negligible effect on the performance. Finally, we dive deeper into the relationship between the properties of the retrieved cases and the model performance.

Subject: IJCAI.2023 - Natural Language Processing

#21 Fine-tuned vs. Prompt-tuned Supervised Representations: Which Better Account for Brain Language Representations? [PDF¹] [Copy] [Kimi] [REL]

Authors: Jingyuan Sun, Marie-Francine Moens

To decipher the algorithm underlying the human brain's language representation, previous work probed brain responses to language input with pre-trained artificial neural network (ANN) models fine-tuned on NLU tasks. However, full fine-tuning generally updates the entire parametric space and distorts pre-trained features, cognitively inconsistent with the brain's robust multi-task learning ability. Prompt-tuning, in contrast, protects pre-trained weights and learns task-specific embeddings to fit a task. Could prompt-tuning generate representations that better account for the brain's language representations than fine-tuning? If so, what kind of NLU task leads a pre-trained model to better decode the information represented in the human brain? We investigate these questions by comparing prompt-tuned and fine-tuned representations in neural decoding, that is predicting the linguistic stimulus from the brain activities evoked by the stimulus. We find that on none of the 10 NLU tasks, full fine-tuning significantly outperforms prompt-tuning in neural decoding, implicating that a more brain-consistent tuning method yields representations that better correlate with brain data. Moreover, we identify that tasks dealing with fine-grained concept meaning yield representations that better decode brain activation patterns than other tasks, especially the syntactic chunking task. This indicates that our brain encodes more fine-grained concept information than shallow syntactic information when representing languages.

Subject: IJCAI.2023 - Natural Language Processing

#22 SQuAD-SRC: A Dataset for Multi-Accent Spoken Reading Comprehension [PDF] [Copy] [Kimi] [REL]

Authors: Yixuan Tang, Anthony K.H: Tung

Spoken Reading Comprehension (SRC) is a challenging problem in spoken natural language retrieval, which automatically extracts the answer from the text-form contents according to the audio-form question. However, the existing spoken question answering approaches are mainly based on synthetically generated audio-form data, which may be ineffectively applied for multi-accent spoken question answering directly in many real-world applications. In this paper, we construct a large-scale multi-accent human spoken dataset SQuAD-SRC, in order to study the problem of multi-accent spoken reading comprehension. We choose 24 native English speakers from six different countries with various English accents and construct audio-form questions to the correspondent text-form contents by the chosen speakers. The dataset consists of 98,169 spoken question answering pairs and 20,963 passages from the popular machine reading comprehension dataset SQuAD. We present a statistical analysis of our SQuAD-SRC dataset and conduct extensive experiments on it by comparing cascaded SRC approaches and the enhanced end-to-end ones. Moreover, we explore various adaption strategies to improve the SRC performance, especially for multi-accent spoken questions.

Subject: IJCAI.2023 - Natural Language Processing

#23 PasCore: A Chinese Overlapping Relation Extraction Model Based on Global Pointer Annotation Strategy [PDF²] [Copy] [Kimi²] [REL]

Authors: Peng Wang, Jiafeng Xie, Xiye Chen, Guozheng Li, Wei Li

Recent work for extracting relations from texts has achieved excellent performance. However, existing studies mainly focus on simple relation extraction, these methods perform not well on overlapping triple problem because the tags of shared entities would conflict with each other. Especially, overlapping entities are common and indispensable in Chinese. To address this issue, this paper proposes PasCore, which utilizes a global pointer annotation strategy for overlapping relation extraction in Chinese. PasCore first obtains the sentence vector via general pre-training model encoder, and uses classifier to predicate relations. Subsequently, it uses global pointer annotation strategy for head entity annotation, which uses global tags to label the start and end positions of the entities. Finally, PasCore integrates the relation, head entity and its type to mark the tail entity. Furthermore, PasCore performs conditional layer normalization to fuse features, which connects all stages and greatly enriches the association between relations and entities. Experimental results on both Chinese and English real-world datasets demonstrate that PasCore outperforms strong baselines on relation extraction and, especially, shows superior performance on overlapping relation extraction.

Subject: IJCAI.2023 - Natural Language Processing

#24 Privacy-Preserving End-to-End Spoken Language Understanding [PDF] [Copy] [Kimi] [REL]

Authors: Yinggui Wang, Wei Huang, Le Yang

Spoken language understanding (SLU), one of the key enabling technologies for human-computer interaction in IoT devices, provides an easy-to-use user interface. Human speech can contain a lot of user-sensitive information, such as gender, identity, and sensitive content. New types of security and privacy breaches have thus emerged. Users do not want to expose their personal sensitive information to malicious attacks by untrusted third parties. Thus, the SLU system needs to ensure that a potential malicious attacker cannot deduce the sensitive attributes of the users, while it should avoid greatly compromising the SLU accuracy. To address the above challenge, this paper proposes a novel SLU multi-task privacy-preserving model to prevent both the speech recognition (ASR) and identity recognition (IR) attacks. The model uses the hidden layer separation technique so that SLU information is distributed only in a specific portion of the hidden layer, and the other two types of information are removed to obtain a privacy-secure hidden layer. In order to achieve good balance between efficiency and privacy, we introduce a new mechanism of model pre-training, namely joint adversarial training, to further enhance the user privacy. Experiments over two SLU datasets show that the proposed method can reduce the accuracy of both the ASR and IR attacks close to that of a random guess, while leaving the SLU performance largely unaffected.

Subject: IJCAI.2023 - Natural Language Processing

#25 Beyond Pure Text: Summarizing Financial Reports Based on Both Textual and Tabular Data [PDF] [Copy] [Kimi¹] [REL]

Authors: Ziao Wang, Zelin Jiang, Xiaofeng Zhang, Jaehyeon Soon, Jialu Zhang, Wang Xiaoyao, Hongwei Du

Abstractive text summarization is to generate concise summaries that well preserve both salient information and the overall semantic meanings of the given documents. However, real-world documents, e.g., financial reports, generally contain rich data such as charts and tabular data which invalidates most existing text summarization approaches. This paper is thus motivated to propose this novel approach to simultaneously summarize both textual and tabular data. Particularly, we first manually construct a “table+text → summary” dataset. Then, the tabular data is respectively embedded in a row-wise and column-wise manner, and the textual data is encoded at the sentence-level via an employed pre-trained model. We propose a salient detector gate respectively performed between each pair of row/column and sentence embeddings. The highly correlated content is considered as salient information that must be summarized. Extensive experiments have been performed on our constructed dataset and the promising results demonstrate the effectiveness of the proposed approach w.r.t. a number of both automatic and human evaluation criteria.

Subject: IJCAI.2023 - Natural Language Processing