IJCAI.2021 - Machine Learning Applications

Total: 13

#1 Toward Optimal Solution for the Context-Attentive Bandit Problem [PDF] [Copy] [Kimi]

Authors: Djallel Bouneffouf ; Raphael Feraud ; Sohini Upadhyay ; Irina Rish ; Yasaman Khazaeni

In various recommender system applications, from medical diagnosis to dialog systems, due to observation costs only a small subset of a potentially large number of context variables can be observed at each iteration; however, the agent has a freedom to choose which variables to observe. In this paper, we analyze and extend an online learning framework known as Context-Attentive Bandit, We derive a novel algorithm, called Context-Attentive Thompson Sampling (CATS), which builds upon the Linear Thompson Sampling approach, adapting it to Context-Attentive Bandit setting. We provide a theoretical regret analysis and an extensive empirical evaluation demonstrating advantages of the proposed approach over several baseline methods on a variety of real-life datasets.

#2 Sample Efficient Decentralized Stochastic Frank-Wolfe Methods for Continuous DR-Submodular Maximization [PDF] [Copy] [Kimi]

Authors: Hongchang Gao ; Hanzi Xu ; Slobodan Vucetic

Continuous DR-submodular maximization is an important machine learning problem, which covers numerous popular applications. With the emergence of large-scale distributed data, developing efficient algorithms for the continuous DR-submodular maximization, such as the decentralized Frank-Wolfe method, became an important challenge. However, existing decentralized Frank-Wolfe methods for this kind of problem have the sample complexity of $\mathcal{O}(1/\epsilon^3)$, incurring a large computational overhead. In this paper, we propose two novel sample efficient decentralized Frank-Wolfe methods to address this challenge. Our theoretical results demonstrate that the sample complexity of the two proposed methods is $\mathcal{O}(1/\epsilon^2)$, which is better than $\mathcal{O}(1/\epsilon^3)$ of the existing methods. As far as we know, this is the first published result achieving such a favorable sample complexity. Extensive experimental results confirm the effectiveness of the proposed methods.

#3 Self-Guided Community Detection on Networks with Missing Edges [PDF] [Copy] [Kimi]

Authors: Dongxiao He ; Shuai Li ; Di Jin ; Pengfei Jiao ; Yuxiao Huang

The vast majority of community detection algorithms assume that the networks are totally observed. However, in reality many networks cannot be fully observed. On such network is edges-missing network, where some relationships (edges) between two entities are missing. Recently, several works have been proposed to solve this problem by combining link prediction and community detection in a two-stage method or in a unified framework. However, the goal of link prediction, which is to predict as many correct edges as possible, is not consistent with the requirement for predicting the important edges for discovering community structure on edges-missing networks. Thus, combining link prediction and community detection cannot work very well in terms of detecting community structure for edges-missing network. In this paper, we propose a community self-guided generative model which jointly completes the edges-missing network and identifies communities. In our new model, completing missing edges and identifying communities are not isolated but closely intertwined. Furthermore, we developed an effective model inference method that combines a nested Expectation-Maximization (EM) algorithm and Metropolis-Hastings Sampling. Extensive experiments on real-world edges-missing networks show that our model can effectively detect community structures while completing missing edges.

#4 Two-Sided Wasserstein Procrustes Analysis [PDF] [Copy] [Kimi]

Authors: Kun Jin ; Chaoyue Liu ; Cathy Xia

Learning correspondence between sets of objects is a key component in many machine learning tasks.Recently, optimal Transport (OT) has been successfully applied to such correspondence problems and it is appealing as a fully unsupervised approach. However, OT requires pairwise instances be directly comparable in a common metric space. This limits its applicability when feature spaces are of different dimensions or not directly comparable. In addition, OT only focuses on pairwise correspondence without sensing global transformations. To address these challenges, we propose a new method to jointly learn the optimal coupling between twosets, and the optimal transformations (e.g. rotation, projection and scaling) of each set based on a two-sided Wassertein Procrustes analysis (TWP). Since the joint problem is a non-convex optimization problem, we present a reformulation that renders the problem component-wise convex. We then propose a novel algorithm to solve the problem harnessing a Gauss–Seidel method. We further present competitive results of TWP on various applicationscompared with state-of-the-art methods.

#5 Solving Math Word Problems with Teacher Supervision [PDF] [Copy] [Kimi]

Authors: Zhenwen Liang ; Xiangliang Zhang

Math word problems (MWPs) have been recently addressed with Seq2Seq models by `translating' math problems described in natural language to a mathematical expression, following a typical encoder-decoder structure. Although effective in solving classical math problems, these models fail when a subtle variation is applied to the word expression of a math problem, and leads to a remarkably different answer. We find the failure is because MWPs with different answers but similar math formula expression are encoded closely in the latent space. We thus designed a teacher module to make the MWP encoding vector match the correct solution and disaccord from the wrong solutions, which are manipulated from the correct solution. Experimental results on two benchmark MWPs datasets verified that our proposed solution outperforms the state-of-the-art models.

#6 Collaborative Graph Learning with Auxiliary Text for Temporal Event Prediction in Healthcare [PDF] [Copy] [Kimi]

Authors: Chang Lu ; Chandan K Reddy ; Prithwish Chakraborty ; Samantha Kleinberg ; Yue Ning

Accurate and explainable health event predictions are becoming crucial for healthcare providers to develop care plans for patients. The availability of electronic health records (EHR) has enabled machine learning advances in providing these predictions. However, many deep-learning-based methods are not satisfactory in solving several key challenges: 1) effectively utilizing disease domain knowledge; 2) collaboratively learning representations of patients and diseases; and 3) incorporating unstructured features. To address these issues, we propose a collaborative graph learning model to explore patient-disease interactions and medical domain knowledge. Our solution is able to capture structural features of both patients and diseases. The proposed model also utilizes unstructured text data by employing an attention manipulating strategy and then integrates attentive text features into a sequential learning process. We conduct extensive experiments on two important healthcare problems to show the competitive prediction performance of the proposed method compared with various state-of-the-art models. We also confirm the effectiveness of learned representations and model interpretability by a set of ablation and case studies.

#7 MDNN: A Multimodal Deep Neural Network for Predicting Drug-Drug Interaction Events [PDF] [Copy] [Kimi]

Authors: Tengfei Lyu ; Jianliang Gao ; Ling Tian ; Zhao Li ; Peng Zhang ; Ji Zhang

The interaction of multiple drugs could lead to serious events, which causes injuries and huge medical costs. Accurate prediction of drug-drug interaction (DDI) events can help clinicians make effective decisions and establish appropriate therapy programs. Recently, many AI-based techniques have been proposed for predicting DDI associated events. However, most existing methods pay less attention to the potential correlations between DDI events and other multimodal data such as targets and enzymes. To address this problem, we propose a Multimodal Deep Neural Network (MDNN) for DDI events prediction. In MDNN, we design a two-pathway framework including drug knowledge graph (DKG) based pathway and heterogeneous feature (HF) based pathway to obtain drug multimodal representations. Finally, a multimodal fusion neural layer is designed to explore the complementary among the drug multimodal representations. We conduct extensive experiments on real-world dataset. The results show that MDNN can accurately predict DDI events and outperform the state-of-the-art models.

#8 SPADE: A Semi-supervised Probabilistic Approach for Detecting Errors in Tables [PDF] [Copy] [Kimi]

Authors: Minh Pham ; Craig A. Knoblock ; Muhao Chen ; Binh Vu ; Jay Pujara

Error detection is one of the most important steps in data cleaning and usually requires extensive human interaction to ensure quality. Existing supervised methods in error detection require a significant amount of training data while unsupervised methods rely on fixed inductive biases, which are usually hard to generalize, to solve the problem. In this paper, we present SPADE, a novel semi-supervised probabilistic approach for error detection. SPADE introduces a novel probabilistic active learning model, where the system suggests examples to be labeled based on the agreements between user labels and indicative signals, which are designed to capture potential errors. SPADE uses a two-phase data augmentation process to enrich a dataset before training a deep learning classifier to detect unlabeled errors. In our evaluation, SPADE achieves an average F1-score of 0.91 over five datasets and yields a 10% improvement compared with the state-of-the-art systems.

#9 TEC: A Time Evolving Contextual Graph Model for Speaker State Analysis in Political Debates [PDF] [Copy] [Kimi]

Authors: Ramit Sawhney ; Shivam Agarwal ; Arnav Wadhwa ; Rajiv Shah

Political discourses provide a forum for representatives to express their opinions and contribute towards policy making. Analyzing these discussions is crucial for recognizing possible delegates and making better voting choices in an independent nation. A politician's vote on a proposition is usually associated with their past discourses and impacted by cohesion forces in political parties. We focus on predicting a speaker's vote on a bill by augmenting linguistic models with temporal and cohesion contexts. We propose TEC, a time evolving graph based model that jointly employs links between motions, speakers, and temporal politician states. TEC outperforms competitive models, illustrating the benefit of temporal and contextual signals for predicting a politician's stance.

#10 Adaptive Residue-wise Profile Fusion for Low Homologous Protein Secondary Structure Prediction Using External Knowledge [PDF] [Copy] [Kimi]

Authors: Qin Wang ; Jun Wei ; Boyuan Wang ; Zhen Li ; Sheng Wang ; Shuguang Cui

Protein secondary structure prediction (PSSP) is essential for protein function analysis. However, for low homologous proteins, the PSSP suffers from insufficient input features. In this paper, we explicitly import external self-supervised knowledge for low homologous PSSP under the guidance of residue-wise (amino acid wise) profile fusion. In practice, we firstly demonstrate the superiority of profile over Position-Specific Scoring Matrix (PSSM) for low homologous PSSP. Based on this observation, we introduce the novel self-supervised BERT features as the pseudo profile, which implicitly involves the residue distribution in all native discovered sequences as the complementary features. Furthermore, a novel residue-wise attention is specially designed to adaptively fuse different features (i.e., original low-quality profile, BERT based pseudo profile), which not only takes full advantage of each feature but also avoids noise disturbance. Besides, the feature consistency loss is proposed to accelerate the model learning from multiple semantic levels. Extensive experiments confirm that our method outperforms state-of-the-arts (i.e., 4.7% for extremely low homologous cases on BC40 dataset).

#11 Ordering-Based Causal Discovery with Reinforcement Learning [PDF] [Copy] [Kimi]

Authors: Xiaoqiang Wang ; Yali Du ; Shengyu Zhu ; Liangjun Ke ; Zhitang Chen ; Jianye Hao ; Jun Wang

It is a long-standing question to discover causal relations among a set of variables in many empirical sciences. Recently, Reinforcement Learning (RL) has achieved promising results in causal discovery from observational data. However, searching the space of directed graphs and enforcing acyclicity by implicit penalties tend to be inefficient and restrict the existing RL-based method to small scale problems. In this work, we propose a novel RL-based approach for causal discovery, by incorporating RL into the ordering-based paradigm. Specifically, we formulate the ordering search problem as a multi-step Markov decision process, implement the ordering generating process with an encoder-decoder architecture, and finally use RL to optimize the proposed model based on the reward mechanisms designed for each ordering. A generated ordering would then be processed using variable selection to obtain the final causal graph. We analyze the consistency and computational complexity of the proposed method, and empirically show that a pretrained model can be exploited to accelerate training. Experimental results on both synthetic and real data sets shows that the proposed method achieves a much improved performance over existing RL-based method.

#12 Boosting Offline Reinforcement Learning with Residual Generative Modeling [PDF] [Copy] [Kimi]

Authors: Hua Wei ; Deheng Ye ; Zhao Liu ; Hao Wu ; Bo Yuan ; Qiang Fu ; Wei Yang ; Zhenhui Li

Offline reinforcement learning (RL) tries to learn the near-optimal policy with recorded offline experience without online exploration.Current offline RL research includes: 1) generative modeling, i.e., approximating a policy using fixed data; and 2) learning the state-action value function. While most research focuses on the state-action function part through reducing the bootstrapping error in value function approximation induced by the distribution shift of training data, the effects of error propagation in generative modeling have been neglected. In this paper, we analyze the error in generative modeling. We propose AQL (action-conditioned Q-learning), a residual generative model to reduce policy approximation error for offline RL. We show that our method can learn more accurate policy approximations in different benchmark datasets. In addition, we show that the proposed offline RL method can learn more competitive AI agents in complex control tasks under the multiplayer online battle arena (MOBA) game, Honor of Kings.

#13 Multi-series Time-aware Sequence Partitioning for Disease Progression Modeling [PDF] [Copy] [Kimi]

Authors: Xi Yang ; Yuan Zhang ; Min Chi

Electronic healthcare records (EHRs) are comprehensive longitudinal collections of patient data that play a critical role in modeling the disease progression to facilitate clinical decision-making. Based on EHRs, in this work, we focus on sepsis -- a broad syndrome that can develop from nearly all types of infections (e.g., influenza, pneumonia). The symptoms of sepsis, such as elevated heart rate, fever, and shortness of breath, are vague and common to other illnesses, making the modeling of its progression extremely challenging. Motivated by the recent success of a novel subsequence clustering approach: Toeplitz Inverse Covariance-based Clustering (TICC), we model the sepsis progression as a subsequence partitioning problem and propose a Multi-series Time-aware TICC (MT-TICC), which incorporates multi-series nature and irregular time intervals of EHRs. The effectiveness of MT-TICC is first validated via a case study using a real-world hand gesture dataset with ground-truth labels. Then we further apply it for sepsis progression modeling using EHRs. The results suggest that MT-TICC can significantly outperform competitive baseline models, including the TICC. More importantly, it unveils interpretable patterns, which sheds some light on better understanding the sepsis progression.