IJCAI.2019 - Others | Cool Papers - Immersive Paper Discovery

#1 SparseSense: Human Activity Recognition from Highly Sparse Sensor Data-streams Using Set-based Neural Networks [PDF] [Copy] [Kimi] [REL]

Authors: Alireza Abedin, S. Hamid Rezatofighi, Qinfeng Shi, Damith C. Ranasinghe

Batteryless or so called passive wearables are providing new and innovative methods for human activity recognition (HAR), especially in healthcare applications for older people. Passive sensors are low cost, lightweight, unobtrusive and desirably disposable; attractive attributes for healthcare applications in hospitals and nursing homes. Despite the compelling propositions for sensing applications, the data streams from these sensors are characterised by high sparsity---the time intervals between sensor readings are irregular while the number of readings per unit time are often limited. In this paper, we rigorously explore the problem of learning activity recognition models from temporally sparse data. We describe how to learn directly from sparse data using a deep learning paradigm in an end-to-end manner. We demonstrate significant classification performance improvements on real-world passive sensor datasets from older people over the state-of-the-art deep learning human activity recognition models. Further, we provide insights into the model's behaviour through complementary experiments on a benchmark dataset and visualisation of the learned activity feature spaces.

#2 Governance by Glass-Box: Implementing Transparent Moral Bounds for AI Behaviour [PDF] [Copy] [Kimi¹] [REL]

Authors: Andrea Aler Tubella, Andreas Theodorou, Frank Dignum, Virginia Dignum

Artificial Intelligence (AI) applications are being used to predict and assess behaviour in multiple domains which directly affect human well-being. However, if AI is to improve people’s lives, then people must be able to trust it, by being able to understand what the system is doing and why. Although transparency is often seen as the requirement in this case, realistically it might not always be possible, whereas the need to ensure that the system operates within set moral bounds remains. In this paper, we present an approach to evaluate the moral bounds of an AI system based on the monitoring of its inputs and outputs. We place a ‘Glass-Box’ around the system by mapping moral values into explicit verifiable norms that constrain inputs and outputs, in such a way that if these remain within the box we can guarantee that the system adheres to the value. The focus on inputs and outputs allows for the verification and comparison of vastly different intelligent systems; from deep neural networks to agent-based systems. The explicit transformation of abstract moral values into concrete norms brings great benefits in terms of explainability; stakeholders know exactly how the system is interpreting and employing relevant abstract moral human values and calibrate their trust accordingly. Moreover, by operating at a higher level we can check the compliance of the system with different interpretations of the same value.

#3 Decision Making for Improving Maritime Traffic Safety Using Constraint Programming [PDF] [Copy] [Kimi] [REL]

Authors: Saumya Bhatnagar, Akshat Kumar, Hoong Chuin Lau

Maritime navigational safety is of utmost importance to prevent vessel collisions in heavily trafficked ports, and avoid environmental costs. In case of a likely near miss among vessels, port traffic controllers provide assistance for safely navigating the waters, often at very short lead times. A better strategy is to avoid such situations from even happening. To achieve this, we a) formalize the decision model for traffic hotspot mitigation including realistic maritime navigational features and constraints through consultations with domain experts; and b) develop a constraint programming based scheduling approach to mitigate hotspots. We model the problem as a variant of the resource constrained project scheduling problem to adjust vessel movement schedules such that the average delay is minimized and navigational safety constraints are also satisfied. We conduct a thorough evaluation on key performance indicators using real world data, and demonstrate the effectiveness of our approach in mitigating high-risk situations.

#4 Evaluating the Interpretability of the Knowledge Compilation Map: Communicating Logical Statements Effectively [PDF] [Copy] [Kimi] [REL]

Authors: Serena Booth, Christian Muise, Julie Shah

Knowledge compilation techniques translate propositional theories into equivalent forms to increase their computational tractability. But, how should we best present these propositional theories to a human? We analyze the standard taxonomy of propositional theories for relative interpretability across three model domains: highway driving, emergency triage, and the chopsticks game. We generate decision-making agents which produce logical explanations for their actions and apply knowledge compilation to these explanations. Then, we evaluate how quickly, accurately, and confidently users comprehend the generated explanations. We find that domain, formula size, and negated logical connectives significantly affect comprehension while formula properties typically associated with interpretability are not strong predictors of human ability to comprehend the theory.

#5 AI-powered Posture Training: Application of Machine Learning in Sitting Posture Recognition Using the LifeChair Smart Cushion [PDF] [Copy] [Kimi] [REL]

Authors: Katia Bourahmoune, Toshiyuki Amagasa

Humans spend on average more than half of their day sitting down. The ill-effects of poor sitting posture and prolonged sitting on physical and mental health have been extensively studied, and solutions for curbing this sedentary epidemic have received special attention in recent years. With the recent advances in sensing technologies and Artificial Intelligence (AI), sitting posture monitoring and correction is one of the key problems to address for enhancing human well-being using AI. We present the application of a sitting posture training smart cushion called LifeChair that combines a novel pressure sensing technology, a smartphone app interface and machine learning (ML) for real-time sitting posture recognition and seated stretching guidance. We present our experimental design for sitting posture and stretch pose data collection using our posture training system. We achieved an accuracy of 98.93% in detecting more than 13 different sitting postures using a fast and robust supervised learning algorithm. We also establish the importance of taking into account the divergence in user body mass index in posture monitoring. Additionally, we present the first ML-based human stretch pose recognition system for pressure sensor data and show its performance in classifying six common chair-bound stretches.

#6 Improving Law Enforcement Daily Deployment Through Machine Learning-Informed Optimization under Uncertainty [PDF] [Copy] [Kimi] [REL]

Authors: Jonathan Chase, Duc Thien Nguyen, Haiyang Sun, Hoong Chuin Lau

Urban law enforcement agencies are under great pressure to respond to emergency incidents effectively while operating within restricted budgets. Minutes saved on emergency response times can save lives and catch criminals, and a responsive police force can deter crime and bring peace of mind to citizens. To efficiently minimize the response times of a law enforcement agency operating in a dense urban environment with limited manpower, we consider in this paper the problem of optimizing the spatial and temporal deployment of law enforcement agents to predefined patrol regions in a real-world scenario informed by machine learning. To this end, we develop a mixed integer linear optimization formulation (MIP) to minimize the risk of failing response time targets. Given the stochasticity of the environment in terms of incident numbers, location, timing, and duration, we use Sample Average Approximation (SAA) to find a robust deployment plan. To overcome the sparsity of real data, samples are provided by an incident generator that learns the spatio-temporal distribution and demand parameters of incidents from a real world historical dataset and generates sets of training incidents accordingly. To improve runtime performance across multiple samples, we implement a heuristic based on Iterated Local Search (ILS), as the solution is intended to create deployment plans quickly on a daily basis. Experimental results demonstrate that ILS performs well against the integer model while offering substantial gains in execution time.

#7 Risk Assessment for Networked-guarantee Loans Using High-order Graph Attention Representation [PDF] [Copy] [Kimi] [REL]

Authors: Dawei Cheng, Yi Tu, Zhenwei Ma, Zhibin Niu, Liqing Zhang

Assessing and predicting the default risk of networked-guarantee loans is critical for the commercial banks and financial regulatory authorities. The guarantee relationships between the loan companies are usually modeled as directed networks. Learning the informative low-dimensional representation of the networks is important for the default risk prediction of loan companies, even for the assessment of systematic financial risk level. In this paper, we propose a high-order graph attention representation method (HGAR) to learn the embedding of guarantee networks. Because this financial network is different from other complex networks, such as social, language, or citation networks, we set the binary roles of vertices and define high-order adjacent measures based on financial domain characteristics. We design objective functions in addition to a graph attention layer to capture the importance of nodes. We implement a productive learning strategy and prove that the complexity is near-linear with the number of edges, which could scale to large datasets. Extensive experiments demonstrate the superiority of our model over state-of-the-art method. We also evaluate the model in a real-world loan risk control system, and the results validate the effectiveness of our proposed approaches.

#8 PI-Bully: Personalized Cyberbullying Detection with Peer Influence [PDF] [Copy] [Kimi] [REL]

Authors: Lu Cheng, Jundong Li, Yasin Silva, Deborah Hall, Huan Liu

Cyberbullying has become one of the most pressing online risks for adolescents and has raised serious concerns in society. Recent years have witnessed a surge in research aimed at developing principled learning models to detect cyberbullying behaviors. These efforts have primarily focused on building a single generic classification model to differentiate bullying content from normal (non-bullying) content among all users. These models treat users equally and overlook idiosyncratic information about users that might facilitate the accurate detection of cyberbullying. In this paper, we propose a personalized cyberbullying detection framework, PI-Bully, that draws on empirical findings from psychology highlighting unique characteristics of victims and bullies and peer influence from like-minded users as predictors of cyberbullying behaviors. Our framework is novel in its ability to model peer influence in a collaborative environment and tailor cyberbullying prediction for each individual user. Extensive experimental evaluations on real-world datasets corroborate the effectiveness of the proposed framework.

#9 The Price of Local Fairness in Multistage Selection [PDF] [Copy] [Kimi] [REL]

Authors: Vitalii Emelianov, George Arvanitakis, Nicolas Gast, Krishna Gummadi, Patrick Loiseau

The rise of algorithmic decision making led to active researches on how to define and guarantee fairness, mostly focusing on one-shot decision making. In several important applications such as hiring, however, decisions are made in multiple stage with additional information at each stage. In such cases, fairness issues remain poorly understood. In this paper we study fairness in k-stage selection problems where additional features are observed at every stage. We first introduce two fairness notions, local (per stage) and global (final stage) fairness, that extend the classical fairness notions to the k-stage setting. We propose a simple model based on a probabilistic formulation and show that the locally and globally fair selections that maximize precision can be computed via a linear program. We then define the price of local fairness to measure the loss of precision induced by local constraints; and investigate theoretically and empirically this quantity. In particular, our experiments show that the price of local fairness is generally smaller when the sensitive attribute is observed at the first stage; but globally fair selections are more locally fair when the sensitive attribute is observed at the second stage – hence in both cases it is often possible to have a selection that has a small price of local fairness and is close to locally fair.

#10 Enhancing Stock Movement Prediction with Adversarial Training [PDF⁵] [Copy] [Kimi] [REL]

Authors: Fuli Feng, Huimin Chen, Xiangnan He, Ji Ding, Maosong Sun, Tat-Seng Chua

This paper contributes a new machine learning solution for stock movement prediction, which aims to predict whether the price of a stock will be up or down in the near future. The key novelty is that we propose to employ adversarial training to improve the generalization of a neural network prediction model. The rationality of adversarial training here is that the input features to stock prediction are typically based on stock price, which is essentially a stochastic variable and continuously changed with time by nature. As such, normal training with static price-based features (e.g. the close price) can easily overfit the data, being insufficient to obtain reliable models. To address this problem, we propose to add perturbations to simulate the stochasticity of price variable, and train the model to work well under small yet intentional perturbations. Extensive experiments on two real-world stock data show that our method outperforms the state-of-the-art solution [Xu and Cohen, 2018] with 3.11% relative improvements on average w.r.t. accuracy, validating the usefulness of adversarial training for stock prediction task.

#11 Safe Contextual Bayesian Optimization for Sustainable Room Temperature PID Control Tuning [PDF] [Copy] [Kimi] [REL]

Authors: Marcello Fiducioso, Sebastian Curi, Benedikt Schumacher, Markus Gwerder, Andreas Krause

We tune one of the most common heating, ventilation, and air conditioning (HVAC) control loops, namely the temperature control of a room. For economical and environmental reasons, it is of prime importance to optimize the performance of this system. Buildings account from 20 to 40 % of a country energy consumption, and almost 50 % of it comes from HVAC systems. Scenario projections predict a 30 % decrease in heating consumption by 2050 due to efficiency increase. Advanced control techniques can improve performance; however, the proportional-integral-derivative (PID) control is typically used due to its simplicity and overall performance. We use Safe Contextual Bayesian Optimization to optimize the PID parameters without human intervention. We reduce costs by 32 % compared to the current PID controller setting while assuring safety and comfort to people in the room. The results of this work have an immediate impact on the room control loop performances and its related commissioning costs. Furthermore, this successful attempt paves the way for further use at different levels of HVAC systems, with promising energy, operational, and commissioning costs savings, and it is a practical demonstration of the positive effects that Artificial Intelligence can have on environmental sustainability.

#12 DDL: Deep Dictionary Learning for Predictive Phenotyping [PDF] [Copy] [Kimi] [REL]

Authors: Tianfan Fu, Trong Nghia Hoang, Cao Xiao, Jimeng Sun

Predictive phenotyping is about accurately predicting what phenotypes will occur in the next clinical visit based on longitudinal Electronic Health Record (EHR) data. Several deep learning (DL) models have demonstrated great performance in predictive phenotyping. However, these DL-based phenotyping models requires access to a large amount of labeled data, which are often expensive to acquire. To address this label-insufficient challenge, we propose a deep dictionary learning framework (DDL) for phenotyping, which utilizes unlabeled data as a complementary source of information to generate a better, more succinct data representation. With extensive experiments on multiple real-world EHR datasets, we demonstrated DDL can outperform the state of the art predictive phenotyping methods on a wide variety of clinical tasks that require patient phenotyping such as heart failure classification, mortality prediction, and sequential prediction. All empirical results consistently show that unlabeled data can indeed be used to generate better data representation, which helps improve DDL's phenotyping performance over existing baseline methods that only uses labeled data.

#13 Improving Customer Satisfaction in Bike Sharing Systems through Dynamic Repositioning [PDF] [Copy] [Kimi] [REL]

Authors: Supriyo Ghosh, Jing Yu Koh, Patrick Jaillet

In bike sharing systems (BSSs), the uncoordinated movements of customers using bikes lead to empty or congested stations, which causes a significant loss in customer demand. In order to reduce the lost demand, a wide variety of existing research has employed a fixed set of historical demand patterns to design efficient bike repositioning solutions. However, the progress remains slow in understanding the underlying uncertainties in demand and designing proactive robust bike repositioning solutions. To bridge this gap, we propose a dynamic bike repositioning approach based on a probabilistic satisficing method which uses the uncertain demand parameters that are learnt from historical data. We develop a novel and computationally efficient mixed integer linear program for maximizing the probability of satisfying the uncertain demand so as to improve the overall customer satisfaction and efficiency of the system. Extensive experimental results from a simulation model built on a real-world bike sharing data set demonstrate that our approach is not only robust to uncertainties in customer demand, but also outperforms the existing state-of-the-art repositioning approaches in terms of reducing the expected lost demand.

#14 mdfa: Multi-Differential Fairness Auditor for Black Box Classifiers [PDF] [Copy] [Kimi] [REL]

Authors: Xavier Gitiaux, Huzefa Rangwala

Machine learning algorithms are increasingly involved in sensitive decision-making processes with adversarial implications on individuals. This paper presents a new tool, mdfa that identifies the characteristics of the victims of a classifier's discrimination. We measure discrimination as a violation of multi-differential fairness. Multi-differential fairness is a guarantee that a black box classifier's outcomes do not leak information on the sensitive attributes of a small group of individuals. We reduce the problem of identifying worst-case violations to matching distributions and predicting where sensitive attributes and classifier's outcomes coincide. We apply mdfa to a recidivism risk assessment classifier widely used in the United States and demonstrate that for individuals with little criminal history, identified African-Americans are three-times more likely to be considered at high risk of violent recidivism than similar non-African-Americans.

#15 CounterFactual Regression with Importance Sampling Weights [PDF¹] [Copy] [Kimi¹] [REL]

Authors: Negar Hassanpour, Russell Greiner

Perhaps the most pressing concern of a patient diagnosed with cancer is her life expectancy under various treatment options. For a binary-treatment case, this translates into estimating the difference between the outcomes (e.g., survival time) of the two available treatment options – i.e., her Individual Treatment Effect (ITE). This is especially challenging to estimate from observational data, as that data has selection bias: the treatment assigned to a patient depends on that patient's attributes. In this work, we borrow ideas from domain adaptation to address the distributional shift between the source (outcome of the administered treatment, appearing in the observed training data) and target (outcome of the alternative treatment) that exists due to selection bias. We propose a context-aware importance sampling re-weighing scheme, built on top of a representation learning module, for estimating ITEs. Empirical results on two publicly available benchmarks demonstrate that the proposed method significantly outperforms state-of-the-art.

#16 MINA: Multilevel Knowledge-Guided Attention for Modeling Electrocardiography Signals [PDF] [Copy] [Kimi] [REL]

Authors: Shenda Hong, Cao Xiao, Tengfei Ma, Hongyan Li, Jimeng Sun

Electrocardiography (ECG) signals are commonly used to diagnose various cardiac abnormalities. Recently, deep learning models showed initial success on modeling ECG data, however they are mostly black-box, thus lack interpretability needed for clinical usage. In this work, we propose MultIlevel kNowledge-guided Attention networks (MINA) that predict heart diseases from ECG signals with intuitive explanation aligned with medical knowledge. By extracting multilevel (beat-, rhythm- and frequency-level) domain knowledge features separately, MINA combines the medical knowledge and ECG data via a multilevel attention model, making the learned models highly interpretable. Our experiments showed MINA achieved PR-AUC 0.9436 (outperforming the best baseline by 5.51%) in real world ECG dataset. Finally, MINA also demonstrated robust performance and strong interpretability against signal distortion and noise contamination.

#17 RDPD: Rich Data Helps Poor Data via Imitation [PDF] [Copy] [Kimi] [REL]

Authors: Shenda Hong, Cao Xiao, Trong Nghia Hoang, Tengfei Ma, Hongyan Li, Jimeng Sun

In many situations, we need to build and deploy separate models in related environments with different data qualities. For example, an environment with strong observation equipments (e.g., intensive care units) often provides high-quality multi-modal data, which are acquired from multiple sensory devices and have rich-feature representations. On the other hand, an environment with poor observation equipment (e.g., at home) only provides low-quality, uni-modal data with poor-feature representations. To deploy a competitive model in a poor-data environment without requiring direct access to multi-modal data acquired from a rich-data environment, this paper develops and presents a knowledge distillation (KD) method (RDPD) to enhance a predictive model trained on poor data using knowledge distilled from a high-complexity model trained on rich, private data. We evaluated RDPD on three real-world datasets and shown that its distilled model consistently outperformed all baselines across all datasets, especially achieving the greatest performance improvement over a model trained only on low-quality data by 24.56% on PR-AUC and 12.21% on ROC-AUC, and over that of a state-of-the-art KD model by 5.91% on PR-AUC and 4.44% on ROC-AUC.

#18 Systematic Conservation Planning for Sustainable Land-use Policies: A Constrained Partitioning Approach to Reserve Selection and Design. [PDF] [Copy] [Kimi] [REL]

Authors: Dimitri Justeau-Allaire, Philippe Vismara, Philippe Birnbaum, Xavier Lorca

Faced with natural habitat degradation, fragmentation, and destruction, it is a major challenge for environmental managers to implement sustainable land use policies promoting socioeconomic development and natural habitat conservation in a balanced way. Relying on artificial intelligence and operational research, reserve selection and design models can be of assistance. This paper introduces a partitioning approach based on Constraint Programming (CP) for the reserve selection and design problem, dealing with both coverage and complex spatial constraints. Moreover, it introduces the first CP formulation of the buffer zone constraint, which can be reused to compose more complex spatial constraints. This approach has been evaluated in a real-world dataset addressing the problem of forest fragmentation in New Caledonia, a biodiversity hotspot where managers are gaining interest in integrating these methods into their decisional processes. Through several scenarios, it showed expressiveness, flexibility, and ability to quickly find solutions to complex questions.

#19 Truly Batch Apprenticeship Learning with Deep Successor Features [PDF] [Copy] [Kimi] [REL]

Authors: Donghun Lee, Srivatsan Srinivasan, Finale Doshi-Velez

We introduce a novel apprenticeship learning algorithm to learn an expert's underlying reward structure in off-policy model-free batch settings. Unlike existing methods that require hand-crafted features, on-policy evaluation, further data acquisition for evaluation policies or the knowledge of model dynamics, our algorithm requires only batch data (demonstrations) of the observed expert behavior. Such settings are common in many real-world tasks---health care, finance, or industrial process control---where accurate simulators do not exist and additional data acquisition is costly. We develop a transition-regularized imitation learning model to learn a rich feature representation and a near-expert initial policy that makes the subsequent batch inverse reinforcement learning process viable. We also introduce deep successor feature networks that perform off-policy evaluation to estimate feature expectations of candidate policies. Under the batch setting, our method achieves superior results on control benchmarks as well as a real clinical task of sepsis management in the Intensive Care Unit.

#20 Scribble-to-Painting Transformation with Multi-Task Generative Adversarial Networks [PDF] [Copy] [Kimi] [REL]

Authors: Jinning Li, Yexiang Xue

We propose the Dual Scribble-to-Painting Network (DSP-Net), which is able to produce artistic paintings based on user-generated scribbles. In scribble-to-painting transformation, a neural net has to infer additional details of the image, given relatively sparse information contained in the outlines of the scribble. Therefore, it is more challenging than classical image style transfer, in which the information content is reduced from photos to paintings. Inspired by the human cognitive process, we propose a multi-task generative adversarial network, which consists of two jointly trained neural nets -- one for generating artistic images and the other one for semantic segmentation. We demonstrate that joint training on these two tasks brings in additional benefit. Experimental result shows that DSP-Net outperforms state-of-the-art models both visually and quantitatively. In addition, we publish a large dataset for scribble-to-painting transformation.

#21 Diversity-Inducing Policy Gradient: Using Maximum Mean Discrepancy to Find a Set of Diverse Policies [PDF] [Copy] [Kimi] [REL]

Authors: Muhammad Masood, Finale Doshi-Velez

Standard reinforcement learning methods aim to master one way of solving a task whereas there may exist multiple near-optimal policies. Being able to identify this collection of near-optimal policies can allow a domain expert to efficiently explore the space of reasonable solutions. Unfortunately, existing approaches that quantify uncertainty over policies are not ultimately relevant to finding policies with qualitatively distinct behaviors. In this work, we formalize the difference between policies as a difference between the distribution of trajectories induced by each policy, which encourages diversity with respect to both state visitation and action choices. We derive a gradient-based optimization technique that can be combined with existing policy gradient methods to now identify diverse collections of well-performing policies. We demonstrate our approach on benchmarks and a healthcare task.

#22 KitcheNette: Predicting and Ranking Food Ingredient Pairings using Siamese Neural Network [PDF] [Copy] [Kimi] [REL]

Authors: Donghyeon Park, Keonwoo Kim, Yonggyu Park, Jungwoon Shin, Jaewoo Kang

As a vast number of ingredients exist in the culinary world, there are countless food ingredient pairings, but only a small number of pairings have been adopted by chefs and studied by food researchers. In this work, we propose KitcheNette which is a model that predicts food ingredient pairing scores and recommends optimal ingredient pairings. KitcheNette employs Siamese neural networks and is trained on our annotated dataset containing 300K scores of pairings generated from numerous ingredients in food recipes. As the results demonstrate, our model not only outperforms other baseline models, but also can recommend complementary food pairings and discover novel ingredient pairings.

#23 MNN: Multimodal Attentional Neural Networks for Diagnosis Prediction [PDF] [Copy] [Kimi] [REL]

Authors: Zhi Qiao, Xian Wu, Shen Ge, Wei Fan

Diagnosis prediction plays a key role in clinical decision supporting process, which attracted extensive research attention recently. Existing studies mainly utilize discrete medical codes (e.g., the ICD codes and procedure codes) as the primary features in prediction. However, in real clinical settings, such medical codes could be either incomplete or erroneous. For example, missed diagnosis will neglect some codes which should be included, mis-diagnosis will generate incorrect medical codes. To increase the robustness towards noisy data, we introduce textual clinical notes in addition to medical codes. Combining information from both sides will lead to improved understanding towards clinical health conditions. To accommodate both the textual notes and discrete medical codes in the same framework, we propose Multimodal Attentional Neural Networks (MNN), which integrates multi-modal data in a collaborative manner. Experimental results on real world EHR datasets demonstrate the advantages of MNN in terms of both robustness and accuracy.

#24 Global Robustness Evaluation of Deep Neural Networks with Provable Guarantees for the Hamming Distance [PDF] [Copy] [Kimi] [REL]

Authors: Wenjie Ruan, Min Wu, Youcheng Sun, Xiaowei Huang, Daniel Kroening, Marta Kwiatkowska

Deployment of deep neural networks (DNNs) in safety-critical systems requires provable guarantees for their correct behaviours. We compute the maximal radius of a safe norm ball around a given input, within which there are no adversarial examples for a trained DNN. We define global robustness as an expectation of the maximal safe radius over a test dataset, and develop an algorithm to approximate the global robustness measure by iteratively computing its lower and upper bounds. Our algorithm is the first efficient method for the Hamming (L0) distance, and we hypothesise that this norm is a good proxy for a certain class of physical attacks. The algorithm is anytime, i.e., it returns intermediate bounds and robustness estimates that are gradually, but strictly, improved as the computation proceeds; tensor-based, i.e., the computation is conducted over a set of inputs simultaneously to enable efficient GPU computation; and has provable guarantees, i.e., both the bounds and the robustness estimates can converge to their optimal values. Finally, we demonstrate the utility of our approach by applying the algorithm to a set of challenging problems.

#25 Pre-training of Graph Augmented Transformers for Medication Recommendation [PDF] [Copy] [Kimi] [REL]

Authors: Junyuan Shang, Tengfei Ma, Cao Xiao, Jimeng Sun

Medication recommendation is an important healthcare application. It is commonly formulated as a temporal prediction task. Hence, most existing works only utilize longitudinal electronic health records (EHRs) from a small number of patients with multiple visits ignoring a large number of patients with a single visit (selection bias). Moreover, important hierarchical knowledge such as diagnosis hierarchy is not leveraged in the representation learning process. Despite the success of deep learning techniques in computational phenotyping, most previous approaches have two limitations: task-oriented representation and ignoring hierarchies of medical codes. To address these challenges, we propose G-BERT, a new model to combine the power of Graph Neural Networks (GNNs) and BERT (Bidirectional Encoder Representations from Transformers) for medical code representation and medication recommendation. We use GNNs to represent the internal hierarchical structures of medical codes. Then we integrate the GNN representation into a transformer-based visit encoder and pre-train it on EHR data from patients only with a single visit. The pre-trained visit encoder and representation are then fine-tuned for downstream predictive tasks on longitudinal EHRs from patients with multiple visits. G-BERT is the first to bring the language model pre-training schema into the healthcare domain and it achieved state-of-the-art performance on the medication recommendation task.