| Total: 111
This abstract proposes an approach towards goal-oriented modeling of the detection and modeling complex social phenomena in multiparty discourse in an online political strategy game. We developed a two-tier approach that first encodes sociolinguistic behavior as linguistic features then use reinforcement learning to estimate the advantage afforded to any player. In the first tier, sociolinguistic behavior, such as Friendship and Reasoning, that speakers use to influence others are encoded as linguistic features to identify the persuasive strategies applied by each player in simultaneous two-party dialogues. In the second tier, a reinforcement learning approach is used to estimate a graph-aware reward function to quantify the advantage afforded to each player based on their standing in this multiparty setup. We apply this technique to the game Diplomacy, using a dataset comprising of over 15,000 messages exchanged between 78 users. Our graph-aware approach shows robust performance compared to a context-agnostic setup.
NLP applications for code-mixed (CM) or mix-lingual text have gained a significant momentum recently, the main reason being the prevalence of language mixing in social media communications in multi-lingual societies like India, Mexico, Europe, parts of USA etc. Word embeddings are basic building blocks of any NLP system today, yet, word embedding for CM languages is an unexplored territory. The major bottleneck for CM word embeddings is switching points, where the language switches. These locations lack in contextually and statistical systems fail to model this phenomena due to high variance in the seen examples. In this paper we present our initial observations on applying switching point based positional encoding techniques for CM language, specifically Hinglish (Hindi - English). Results are only marginally better than SOTA, but it is evident that positional encoding could be an effective way to train position sensitive language models for CM text.
We present a deep neural network (DNN) for visually classifying whether a person is wearing a protective face mask. Our DNN can be deployed on a resource-limited, sub-10-cm nano-drone: this robotic platform is an ideal candidate to fly in human proximity and perform ubiquitous visual perception safely. This paper describes our pipeline, starting from the dataset collection; the selection and training of a full-precision (i.e., float32) DNN; a quantization phase (i.e., int8), enabling the DNN's deployment on a parallel ultra-low power (PULP) system-on-chip aboard our target nano-drone. Results demonstrate the efficacy of our pipeline with a mean area under the ROC curve score of 0.81, which drops by only ~2% when quantized to 8-bit for deployment.
Out-of-distribution (O.O.D.) generalization remains to be a key challenge for real-world machine learning systems. We describe a method for O.O.D. generalization that, through training, encourages models to only preserve features in the network that are well reused across multiple training domains. Our method combines two complementary neuron-level regularizers with a probabilistic differentiable binary mask over the network, to extract a modular sub-network that achieves better O.O.D. performance than the original network. Preliminary evaluation on two benchmark datasets corroborates the promise of our method.
We introduce a model-agnostic algorithm for manipulating SHapley Additive exPlanations (SHAP) with perturbation of tabular data. It is evaluated on predictive tasks from healthcare and financial domains to illustrate how crucial is the context of data distribution in interpreting machine learning models. Our method supports checking the stability of the explanations used by various stakeholders apparent in the domain of responsible AI; moreover, the result highlights the explanations' vulnerability that can be exploited by an adversary.
We present a generative neural model for open and multi-turn dialog response generation that relies on a multi-dimension attention process to account for the semantic interdependence between the generated words and the conversational history, so as to identify all the words and utterances that influence each generated response. The performance of the model is evaluated on the wide scope DailyDialog corpus and a comparison is made with two other generative neural architectures, using several machine metrics. The results show that the proposed model improves the state of the art for generation accuracy, and its multi-dimension attention allows for a more detailed tracking of the influential words and utterances in the dialog history for response explainability by the dialog history.
Computing devices continue to be increasingly spread out within our everyday environments. Computers are embedded into everyday devices in order to serve the functionality of electronic components or to enable new services in their own right. Existing Substitution-Permutation Network (SPN) ciphers, such as the Advanced Encryption Standard (AES), are not suitable for devices where memory, power consumption or processing power is limited. Lightweight SPN ciphers, such as GIFT-128 provide a solution for running cryptography on low resource devices. The GIFT-128 cryptographic scheme is a building block for GIFT-COFB (Authenticated Encryption with Associated Data), one of the finalists in the ongoing NIST lightweight cryptography standardization process (NISTIR 8369). Determination of an adequate level of security and providing subsequent mechanisms to achieve it, is one of the most pressing problems regarding embedded computing devices. In this paper we present experimental results and comparative study of Deep Learning (DL) based Side Channel Attacks on lightweight GIFT-128. To our knowledge, this is the first study of the security of GIFT-128 against DL-based SCA attacks.
Deep learning is a promising avenue to automate tedious analysis tasks in biomedical imaging. However, its application in such a context is limited by the large amount of labeled data required to train deep learning models. While active learning may be used to reduce the amount of labeling data, many approaches do not consider the cost of annotating, which is often significant in a biomedical imaging setting. In this work we show how annotation cost can be considered and learned during active learning on a classification task on the MNIST dataset.
We propose INDEPROP, a novel Natural Language Processing (NLP) application for combating online disinformation by mitigating propaganda from news articles. INDEPROP (Information-Preserving De-propagandization) involves fine-grained propaganda detection and its removal while maintaining document level coherence, grammatical correctness and most importantly, preserving the news articles’ information content. We curate the first large-scale dataset of its kind consisting of around 1M tokens. We also propose a set of automatic evaluation metrics for the same and observe its high correlation with human judgment. Furthermore, we show that fine-tuning the existing propaganda detection systems on our dataset considerably improves their generalization to the test set.
Fossil energy products such as liquefied natural gas (LNG) are among Canada's most important exports. Canadian engineers devote themselves to constructing visual surveillance systems for detecting potential LNG emissions in energy facilities. Beyond the previous infrared (IR) surveillance system, in this paper, a multimodal fusion-based LNG detection (MFLNGD) framework is proposed to enhance the detection quality by the integration of IR and visible (VI) cameras. Besides, a Fourier transformer is developed to fuse IR and VI features better. The experimental results suggest the effectiveness of the proposed framework.
Information diffusion in social networks is a well-studied concept in social choice theory. We propose the study of the diffusion of two secrets in a heterogeneous environment from the complexity perspective, that is, there are two different networks with the same set of agents (e.g., the structure of the set of followers might be different in two distinct social networks). Formally, our model combines two group identification processes for which we do have independent desiderata---either constructive, where we would like a given group of agents to be exposed to a secret, or destructive, where a given group of agents should not be exposed to a secret. To be able to reach these targets, we can either delete an agent or introduce a previously latent agent. Our results are mostly negative---all of the problems are NP-hard. Therefore, we propose a parameterized study with respect to the natural parameters, the number of influenced agents, the size of the required/protected agent sets, and the duration of the diffusion process. Most of the studied problems remain W[1]-hard even for a combination of these parameters. We complement these results with nearly optimal XP algorithms.
Referring expression comprehension aims at grounding the object in an image referred to by the expression. Scene text that serves as an identifier has a natural advantage in referring to objects. However, existing methods only consider the text in the expression, but ignore the text in the image, leading to a mismatch. In this paper, we propose a novel model that can recognize the scene text. We assign the extracted scene text to its corresponding visual region and ground the target object guided by expression. Experimental results on two benchmarks demonstrate the effectiveness of our model.
This paper considers a multi-state Log Gaussian Cox Process (`"LGCP'') on a graph, where transmissions amongst states are calibrated using a non-parametric approach. We thus consider multi-output LGCPs and introduce numerical approximations to compute posterior distributions extremely quickly and in a completely transparent and reproducible fashion. The model is tested on historical data and shows very good performance.
Convolutional neural network (CNN) based image segmentation has been widely used in analyzing medical images and benefited many real-world disease diagnosis applications. However, existing advanced CNN-based medical image segmentation models usually contain numerous parameters that require massive computation and memory, limiting the applicability of these models in the data-constrained or hardware-constrained environments. By leveraging the recently proposed neural architecture search (NAS), this paper presents a novel approach, dubbed Thrifty NAS, to design computation and memory-efficient models for medical image segmentation automatically. The searched models by Thrifty NAS are with much fewer parameters while retaining competitive performance. More specifically, we design a micro level space for cell structure search and a macro level cell path for better network structure modeling. Extensive experimental results in different medical image datasets verify the effectiveness of the proposed method with competitive segmentation performance, especially with minuscule neural architecture model size, i.e., 0.61M that is superior to U-Net (7.76 M) and UNet++ (9.04 M).
This paper exploits self-supervised learning (SSL) to learn more accurate and robust representations from the user-item interaction graph. Particularly, we propose a novel SSL model that effectively leverages contrastive multi-view learning and pseudo-siamese network to construct a pre-training and post-training framework. Moreover, we present three graph augmentation techniques during the pre-training stage and explore the effects of combining different augmentations, which allow us to learn general and robust representations for the GNN-based recommendation. Simple experimental evaluations on real-world datasets show that the proposed solution significantly improves the recommendation accuracy, especially for sparse data, and is also noise resistant.
Social media, blogs, and online articles are instant sources of news for internet users globally. But due to their unmoderated nature, a significant percentage of these texts are fake news or rumors. Their deceptive nature and ability to propagate instantly can have an adverse effect on society. In this work, we hypothesize that legitimacy of news has a correlation with its emotion, and propose a multi-task framework predicting both the emotion and legitimacy of news. Experimental results verify that our multi-task models outperform their single-task counterparts in terms of accuracy.
This paper studies the over-parameterization of deep neural networks using the Fisher Information Matrix from information geometry. We identify several surprising trends in the structure of its eigenspectrum, and how this structure relates to the eigenspectrum of the data correlation matrix. We identify how the eigenspectrum relates to the topology of the predictions of the model and develop a "model reduction'' method for deep networks. This ongoing investigation hypothesizes certain universal trends in the FIM of deep networks that may shed light on their effectiveness.
We propose a novel method for transforming the emotional content in an image to a specified target emotion. Existing techniques such as a single generative adversarial network (GAN) struggle to perform well on unconstrained images, especially when data is limited. Our method addresses this limitation by blending the outputs from two networks to better transform fine details (e.g., faces) while still operating on the broader styles of the full image. We demonstrate our method's potential through a proof-of-concept implementation.
Capturing visual similarity among images is the core of many computer vision and pattern recognition tasks. This problem can be formulated in such a paradigm called metric learning. Most research in the area has been mainly focusing on improving the loss functions and similarity measures. However, due to the ignoring of geometric structure, existing methods often lead to sub-optimal results. Thus, several recent research methods took advantage of Wasserstein distance between batches of samples to characterize the spacial geometry. Although these approaches can achieve enhanced performance, the aggregation over batches definitely hinders Wasserstein distance's superior measure capability and leads to high computational complexity. To address this limitation, we propose a novel Deep Wasserstein Metric Learning framework, which employs Wasserstein distance to precisely capture the relationship among various images under ranking-based loss functions such as contrastive loss and triplet loss. Our method directly computes the distance between images, considering the geometry at a finer granularity than batch level. Furthermore, we introduce a new efficient algorithm using Sinkhorn approximation and Wasserstein measure coreset. The experimental results demonstrate the improvements of our framework over various baselines in different applications and benchmark datasets.
A 6-hour early detection of sepsis leads to a significant increase in the chance of surviving it. Previous sepsis early detection studies have focused on improving the performance of supervised learning algorithms while ignoring the potential correlation in data mining, and there was no reliable method to deal with the problem of incomplete data. In this paper, we proposed the Denoising Transformer AutoEncoder (DTAE) for the first time combining transformer and unsupervised learning. DTAE can learn the correlation of the features required for early detection of sepsis without the label. This method can effectively solve the problems of data sparsity and noise and discover the potential correlation of features by adding DTAE enhancement module without modifying the existing algorithms. Finally, the experimental results show that the proposed method improves the existing algorithms and achieves the best results of early detection.
Lack of explainability in artificial intelligence, specifically deep neural networks, remains a bottleneck for implementing models in practice. Popular techniques such as Gradient-weighted Class Activation Mapping (Grad-CAM) provide a coarse map of salient features in an image, which rarely tells the whole story of what a convolutional neural network(CNN) learned. Using COVID-19 chest X-rays, we present a method for interpreting what a CNN has learned by utilizing Generative Adversarial Networks (GANs). Our GAN framework disentangles lung structure from COVID-19 features. Using this GAN, we can visualize the transition of a pair of COVID negative lungs in a chest radiograph to a COVID positive pair by interpolating in the latent space of the GAN, which provides fine-grained visualization of how the CNN responds to varying features within the lungs.
To mitigate a malware threat it is important to understand the malware’s behavior. The MITRE ATT&ACK ontology specifies an enumeration of tactics, techniques, and procedures (TTP) that characterize malware. However, absent are automated procedures that would characterize, given the malware executable, which part of the execution flow is connected with a specific TTP. This paper provides an automation methodology to locate TTP in a sub-part of the control flow graph that describes the execution flow of a malware executable. This methodology merges graph representation learning and tools for machine learning explanation.
Understanding the emerging behaviors of reinforcement learning agents may be difficult because such agents are often trained using highly complex and expressive models. In recent years, most approaches developed for explaining agent behaviors rely on domain knowledge or on an analysis of the agent’s learned policy. For some domains, relevant knowledge may not be available or may be insufficient for producing meaningful explanations. We suggest using formal model abstractions and transforms, previously used mainly for expediting the search for optimal policies, to automatically explain discrepancies that may arise between the behavior of an agent and the behavior that is anticipated by an observer. We formally define this problem of Reinforcement Learning Policy Explanation(RLPE), suggest a class of transforms which can be used for explaining emergent behaviors, and suggest meth-ods for searching efficiently for an explanation. We demonstrate the approach on standard benchmarks.
We study the robustness of ridge regression, lasso regression, and of a neural network, when the training set has been randomly corrupted and in response to this corruption the training-size is reduced in order to remove the corrupted data. While the neural network appears to be the most robust method among these three, nevertheless lasso regression appears to be the method of choice since it suffers less loss both when the full information is available to the learner, as well as when a significant amount of the original training set has been rendered useless because of random data corruption.
Existing scene graph generation methods suffer the limitations when the image lacks of sufficient visual contexts. To address this limitation, we propose a knowledge-enhanced scene graph generation model with multimodal relation alignment, which supplements the missing visual contexts by well-aligned textual knowledge. First, we represent the textual information into contextualized knowledge which is guided by the visual objects to enhance the contexts. Furthermore, we align the multimodal relation triplets by co-attention module for better semantics fusion. The experimental results show the effectiveness of our method.