IJCAI.2019 - Machine Learning Applications

| Total: 18

#1 Predicting the Visual Focus of Attention in Multi-Person Discussion Videos [PDF] [Copy] [Kimi] [REL]

Authors: Chongyang Bai, Srijan Kumar, Jure Leskovec, Miriam Metzger, Jay F. Nunamaker, V. S. Subrahmanian

Visual focus of attention in multi-person discussions is a crucial nonverbal indicator in tasks such as inter-personal relation inference, speech transcription, and deception detection. However, predicting the focus of attention remains a challenge because the focus changes rapidly, the discussions are highly dynamic, and the people's behaviors are inter-dependent. Here we propose ICAF (Iterative Collective Attention Focus), a collective classification model to jointly learn the visual focus of attention of all people. Every person is modeled using a separate classifier. ICAF models the people collectively---the predictions of all other people's classifiers are used as inputs to each person's classifier. This explicitly incorporates inter-dependencies between all people's behaviors. We evaluate ICAF on a novel dataset of 5 videos (35 people, 109 minutes, 7604 labels in all) of the popular Resistance game and a widely-studied meeting dataset with supervised prediction. See our demo at https://cs.dartmouth.edu/dsail/demos/icaf. ICAF outperforms the strongest baseline by 1%--5% accuracy in predicting the people's visual focus of attention. Further, we propose a lightly supervised technique to train models in the absence of training labels. We show that light-supervised ICAF performs similar to the supervised ICAF, thus showing its effectiveness and generality to previously unseen videos.


#2 A Quantum-inspired Classical Algorithm for Separable Non-negative Matrix Factorization [PDF] [Copy] [Kimi] [REL]

Authors: Zhihuai Chen, Yinan Li, Xiaoming Sun, Pei Yuan, Jialin Zhang

Non-negative Matrix Factorization (NMF) asks to decompose a (entry-wise) non-negative matrix into the product of two smaller-sized nonnegative matrices, which has been shown intractable in general. In order to overcome this issue, separability assumption is introduced which assumes all data points are in a conical hull. This assumption makes NMF tractable and widely used in text analysis and image processing, but still impractical for huge-scale datasets. In this paper, inspired by recent development on dequantizing techniques, we propose a new classical algorithm for separable NMF problem. Our new algorithm runs in polynomial time in the rank and logarithmic in the size of input matrices, which achieves an exponential speedup in the low-rank setting.


#3 MLRDA: A Multi-Task Semi-Supervised Learning Framework for Drug-Drug Interaction Prediction [PDF] [Copy] [Kimi] [REL]

Authors: Xu Chu, Yang Lin, Yasha Wang, Leye Wang, Jiangtao Wang, Jingyue Gao

Drug-drug interactions (DDIs) are a major cause of preventable hospitalizations and deaths. Recently, researchers in the AI community try to improve DDI prediction in two directions, incorporating multiple drug features to better model the pharmacodynamics and adopting multi-task learning to exploit associations among DDI types. However, these two directions are challenging to reconcile due to the sparse nature of the DDI labels which inflates the risk of overfitting of multi-task learning models when incorporating multiple drug features. In this paper, we propose a multi-task semi-supervised learning framework MLRDA for DDI prediction. MLRDA effectively exploits information that is beneficial for DDI prediction in unlabeled drug data by leveraging a novel unsupervised disentangling loss CuXCov. The CuXCov loss cooperates with the classification loss to disentangle the DDI prediction relevant part from the irrelevant part in a representation learnt by an autoencoder, which helps to ease the difficulty in mining useful information for DDI prediction in both labeled and unlabeled drug data. Moreover, MLRDA adopts a multi-task learning framework to exploit associations among DDI types. Experimental results on real-world datasets demonstrate that MLRDA significantly outperforms state-of-the-art DDI prediction methods by up to 10.3% in AUPR.


#4 Combining ADMM and the Augmented Lagrangian Method for Efficiently Handling Many Constraints [PDF] [Copy] [Kimi] [REL]

Authors: Joachim Giesen, Soeren Laue

Many machine learning methods entail minimizing a loss-function that is the sum of the losses for each data point. The form of the loss function is exploited algorithmically, for instance in stochastic gradient descent (SGD) and in the alternating direction method of multipliers (ADMM). However, there are also machine learning methods where the entailed optimization problem features the data points not in the objective function but in the form of constraints, typically one constraint per data point. Here, we address the problem of solving convex optimization problems with many convex constraints. Our approach is an extension of ADMM. The straightforward implementation of ADMM for solving constrained optimization problems in a distributed fashion solves constrained subproblems on different compute nodes that are aggregated until a consensus solution is reached. Hence, the straightforward approach has three nested loops: one for reaching consensus, one for the constraints, and one for the unconstrained problems. Here, we show that solving the costly constrained subproblems can be avoided. In our approach, we combine the ability of ADMM to solve convex optimization problems in a distributed setting with the ability of the augmented Lagrangian method to solve constrained optimization problems. Consequently, our algorithm only needs two nested loops. We prove that it inherits the convergence guarantees of both ADMM and the augmented Lagrangian method. Experimental results corroborate our theoretical findings.


#5 Playing Card-Based RTS Games with Deep Reinforcement Learning [PDF] [Copy] [Kimi] [REL]

Authors: Tianyu Liu, Zijie Zheng, Hongchang Li, Kaigui Bian, Lingyang Song

Game AI is of great importance as games are simulations of reality. Recent research on game AI has shown much progress in various kinds of games, such as console games, board games and MOBA games. However, the exploration in RTS games remains a challenge for their huge state space, imperfect information, sparse rewards and various strategies. Besides, the typical card-based RTS games have complex card features and are still lacking solutions. We present a deep model SEAT (selection-attention) to play card-based RTS games. The SEAT model includes two parts, a selection part for card choice and an attention part for card usage, and it learns from scratch via deep reinforcement learning. Comprehensive experiments are performed on Clash Royale, a popular mobile card-based RTS game. Empirical results show that the SEAT model agent makes it to reach a high winning rate against rule-based agents and decision-tree-based agent.


#6 FSM: A Fast Similarity Measurement for Gene Regulatory Networks via Genes' Influence Power [PDF] [Copy] [Kimi] [REL]

Authors: Zhongzhou Liu, Wenbin Hu

The problem of graph similarity measurement is fundamental in both complex networks and bioinformatics researches. Gene regulatory networks (GRNs) describe the interactions between the molecules in organisms, and are widely studied in the fields of medical AI. By measuring the similarity between GRNs, significant information can be obtained to assist the applications like gene functions prediction, drug development and medical diagnosis. Most of the existing similarity measurements have been focusing on the graph isomorphisms and are usually NP-hard problems. Thus, they are not suitable for applications in biology and clinical research due to the complexity and large-scale features of real-world GRNs. In this paper, a fast similarity measurement method called FSM for GRNs is proposed. Unlike the conventional measurements, it pays more attention to the differences between those influential genes. For the convenience and reliability, a new index defined as influence power is adopted to describe the influential genes which have greater position in a GRN. FSM was applied in nine datasets of various scales and is compared with state-of-art methods. The results demonstrated that it ran significantly faster than other methods without sacrificing measurement performance.


#7 Pseudo Supervised Matrix Factorization in Discriminative Subspace [PDF] [Copy] [Kimi] [REL]

Authors: Jiaqi Ma, Yipeng Zhang, Lefei Zhang, Bo Du, Dapeng Tao

Non-negative Matrix Factorization (NMF) and spectral clustering have been proved to be efficient and effective for data clustering tasks and have been applied to various real-world scenes. However, there are still some drawbacks in traditional methods: (1) most existing algorithms only consider high-dimensional data directly while neglect the intrinsic data structure in the low-dimensional subspace; (2) the pseudo-information got in the optimization process is not relevant to most spectral clustering and manifold regularization methods. In this paper, a novel unsupervised matrix factorization method, Pseudo Supervised Matrix Factorization (PSMF), is proposed for data clustering. The main contributions are threefold: (1) to cluster in the discriminant subspace, Linear Discriminant Analysis (LDA) combines with NMF to become a unified framework; (2) we propose a pseudo supervised manifold regularization term which utilizes the pseudo-information to instruct the regularization term in order to find subspace that discriminates different classes; (3) an efficient optimization algorithm is designed to solve the proposed problem with proved convergence. Extensive experiments on multiple benchmark datasets illustrate that the proposed model outperforms other state-of-the-art clustering algorithms.


#8 Representation Learning-Assisted Click-Through Rate Prediction [PDF] [Copy] [Kimi] [REL]

Authors: Wentao Ouyang, Xiuwu Zhang, Shukui Ren, Chao Qi, Zhaojie Liu, Yanlong Du

Click-through rate (CTR) prediction is a critical task in online advertising systems. Most existing methods mainly model the feature-CTR relationship and suffer from the data sparsity issue. In this paper, we propose DeepMCP, which models other types of relationships in order to learn more informative and statistically reliable feature representations, and in consequence to improve the performance of CTR prediction. In particular, DeepMCP contains three parts: a matching subnet, a correlation subnet and a prediction subnet. These subnets model the user-ad, ad-ad and feature-CTR relationship respectively. When these subnets are jointly optimized under the supervision of the target labels, the learned feature representations have both good prediction powers and good representation abilities. Experiments on two large-scale datasets demonstrate that DeepMCP outperforms several state-of-the-art models for CTR prediction.


#9 Dynamic Electronic Toll Collection via Multi-Agent Deep Reinforcement Learning with Edge-Based Graph Convolutional Networks [PDF] [Copy] [Kimi] [REL]

Authors: Wei Qiu, Haipeng Chen, Bo An

Over the past decades, Electronic Toll Collection (ETC) systems have been proved the capability of alleviating traffic congestion in urban areas. Dynamic Electronic Toll Collection (DETC) was recently proposed to further improve the efficiency of ETC, where tolls are dynamically set based on traffic dynamics. However, computing the optimal DETC scheme is computationally difficult and existing approaches are limited to small scale or partial road networks, which significantly restricts the adoption of DETC. To this end, we propose a novel multi-agent reinforcement learning (RL) approach for DETC. We make several key contributions: i) an enhancement over the state-of-the-art RL-based method with a deep neural network representation of the policy and value functions and a temporal difference learning framework to accelerate the update of target values, ii) a novel edge-based graph convolutional neural network (eGCN) to extract the spatio-temporal correlations of the road network state features, iii) a novel cooperative multi-agent reinforcement learning (MARL) which divides the whole road network into partitions according to their geographic and economic characteristics and trains a tolling agent for each partition. Experimental results show that our approach can scale up to realistic-sized problems with robust performance and significantly outperform the state-of-the-art method.


#10 FireCast: Leveraging Deep Learning to Predict Wildfire Spread [PDF] [Copy] [Kimi] [REL]

Authors: David Radke, Anna Hessler, Dan Ellsworth

Destructive wildfires result in billions of dollars in damage each year and are expected to increase in frequency, duration, and severity due to climate change. The current state-of-the-art wildfire spread models rely on mathematical growth predictions and physics-based models, which are difficult and computationally expensive to run. We present and evaluate a novel system, FireCast. FireCast combines artificial intelligence (AI) techniques with data collection strategies from geographic information systems (GIS). FireCast predicts which areas surrounding a burning wildfire have high-risk of near-future wildfire spread, based on historical fire data and using modest computational resources. FireCast is compared to a random prediction model and a commonly used wildfire spread model, Farsite, outperforming both with respect to total accuracy, recall, and F-score.


#11 Faster Distributed Deep Net Training: Computation and Communication Decoupled Stochastic Gradient Descent [PDF] [Copy] [Kimi] [REL]

Authors: Shuheng Shen, Linli Xu, Jingchang Liu, Xianfeng Liang, Yifei Cheng

With the increase in the amount of data and the expansion of model scale, distributed parallel training becomes an important and successful technique to address the optimization challenges. Nevertheless, although distributed stochastic gradient descent (SGD) algorithms can achieve a linear iteration speedup, they are limited significantly in practice by the communication cost, making it difficult to achieve a linear time speedup. In this paper, we propose a computation and communication decoupled stochastic gradient descent (CoCoD-SGD) algorithm to run computation and communication in parallel to reduce the communication cost. We prove that CoCoD-SGD has a linear iteration speedup with respect to the total computation capability of the hardware resources. In addition, it has a lower communication complexity and better time speedup comparing with traditional distributed SGD algorithms. Experiments on deep neural network training demonstrate the significant improvements of CoCoD-SGD: when training ResNet18 and VGG16 with 16 Geforce GTX 1080Ti GPUs, CoCoD-SGD is up to 2-3 x faster than traditional synchronous SGD.


#12 Randomized Adversarial Imitation Learning for Autonomous Driving [PDF] [Copy] [Kimi] [REL]

Authors: MyungJae Shin, Joongheon Kim

With the evolution of various advanced driver assistance system (ADAS) platforms, the design of autonomous driving system is becoming more complex and safety-critical. The autonomous driving system simultaneously activates multiple ADAS functions; and thus it is essential to coordinate various ADAS functions. This paper proposes a randomized adversarial imitation learning (RAIL) method that imitates the coordination of autonomous vehicle equipped with advanced sensors. The RAIL policies are trained through derivative-free optimization for the decision maker that coordinates the proper ADAS functions, e.g., smart cruise control and lane keeping system. Especially, the proposed method is also able to deal with the LIDAR data and makes decisions in complex multi-lane highways and multi-agent environments.


#13 Scaling Fine-grained Modularity Clustering for Massive Graphs [PDF] [Copy] [Kimi] [REL]

Authors: Hiroaki Shiokawa, Toshiyuki Amagasa, Hiroyuki Kitagawa

Modularity clustering is an essential tool to understand complicated graphs. However, existing methods are not applicable to massive graphs due to two serious weaknesses. (1) It is difficult to fully reproduce ground-truth clusters due to the resolution limit problem. (2) They are computationally expensive because all nodes and edges must be computed iteratively. This paper proposes gScarf, which outputs fine-grained clusters within a short running time. To overcome the aforementioned weaknesses, gScarf dynamically prunes unnecessary nodes and edges, ensuring that it captures fine-grained clusters. Experiments show that gScarf outperforms existing methods in terms of running time while finding clusters with high accuracy.


#14 Node Embedding over Temporal Graphs [PDF] [Copy] [Kimi] [REL]

Authors: Uriel Singer, Ido Guy, Kira Radinsky

In this work, we present a method for node embedding in temporal graphs. We propose an algorithm that learns the evolution of a temporal graph's nodes and edges over time and incorporates this dynamics in a temporal node embedding framework for different graph prediction tasks. We present a joint loss function that creates a temporal embedding of a node by learning to combine its historical temporal embeddings, such that it optimizes per given task (e.g., link prediction). The algorithm is initialized using static node embeddings, which are then aligned over the representations of a node at different time points, and eventually adapted for the given task in a joint optimization. We evaluate the effectiveness of our approach over a variety of temporal graphs for the two fundamental tasks of temporal link prediction and multi-label node classification, comparing to competitive baselines and algorithmic alternatives. Our algorithm shows performance improvements across many of the datasets and baselines and is found particularly effective for graphs that are less cohesive, with a lower clustering coefficient.


#15 Medical Concept Embedding with Multiple Ontological Representations [PDF] [Copy] [Kimi] [REL]

Authors: Lihong Song, Chin Wang Cheong, Kejing Yin, William K. Cheung, Benjamin C. M. Fung, Jonathan Poon

Learning representations of medical concepts from the Electronic Health Records (EHR) has been shown effective for predictive analytics in healthcare. Incorporation of medical ontologies has also been explored to further enhance the accuracy and to ensure better alignment with the known medical knowledge. Most of the existing work assumes that medical concepts under the same ontological category should share similar representations, which however does not always hold. In particular, the categorizations in medical ontologies were established with various factors being considered. Medical concepts even under the same ontological category may not follow similar occurrence patterns in the EHR data, leading to contradicting objectives for the representation learning. In this paper, we propose a deep learning model called MMORE which alleviates this conflicting objective issue by allowing multiple representations to be inferred for each ontological category via an attention mechanism. We apply MMORE to diagnosis prediction and our experimental results show that the representations obtained by MMORE can achieve better predictive accuracy and result in clinically meaningful sub-categorization of the existing ontological categories.


#16 Learning Shared Vertex Representation in Heterogeneous Graphs with Convolutional Networks for Recommendation [PDF] [Copy] [Kimi] [REL]

Authors: Yanan Xu, Yanmin Zhu, Yanyan Shen, Jiadi Yu

Collaborative Filtering (CF) is among the most successful techniques in recommendation tasks. Recent works have shown a boost of performance of CF when introducing the pairwise relationships between users and items or among items (users) using interaction data. However, these works usually only utilize one kind of information, i.e., user preference in a user-item interaction matrix or item dependency in interaction sequences which can limit the recommendation performance. In this paper, we propose to mine three kinds of information (user preference, item dependency, and user similarity on behaviors) by converting interaction sequence data into multiple graphs (i.e., a user-item graph, an item-item graph, and a user-subseq graph). We design a novel graph convolutional network (PGCN) to learn shared representations of users and items with the three heterogeneous graphs. In our approach, a neighbor pooling and a convolution operation are designed to aggregate features of neighbors. Extensive experiments on two real-world datasets demonstrate that our graph convolution approaches outperform various competitive methods in terms of two metrics, and the heterogeneous graphs are proved effective for improving recommendation performance.


#17 Dual-Path in Dual-Path Network for Single Image Dehazing [PDF] [Copy] [Kimi] [REL]

Authors: Aiping Yang, Haixin Wang, Zhong Ji, Yanwei Pang, Ling Shao

Recently, deep learning-based single image dehazing method has been a popular approach to tackle dehazing. However, the existing dehazing approaches are performed directly on the original hazy image, which easily results in image blurring and noise amplifying. To address this issue, the paper proposes a DPDP-Net (Dual-Path in Dual-Path network) framework by employing a hierarchical dual path network. Specifically, the first-level dual-path network consists of a Dehazing Network and a Denoising Network, where the Dehazing Network is responsible for haze removal in the structural layer, and the Denoising Network deals with noise in the textural layer, respectively. And the second-level dual-path network lies in the Dehazing Network, which has an AL-Net (Atmospheric Light Network) and a TM-Net (Transmission Map Network), respectively. Concretely, the AL-Net aims to train the non-uniform atmospheric light, while the TM-Net aims to train the transmission map that reflects the visibility of the image. The final dehazing image is obtained by nonlinearly fusing the output of the Denoising Network and the Dehazing Network. Extensive experiments demonstrate that our proposed DPDP-Net achieves competitive performance against the state-of-the-art methods on both synthetic and real-world images.


#18 Disparity-preserved Deep Cross-platform Association for Cross-platform Video Recommendation [PDF] [Copy] [Kimi] [REL]

Authors: Shengze Yu, Xin Wang, Wenwu Zhu, Peng Cui, Jingdong Wang

Cross-platform recommendation aims to improve recommendation accuracy through associating information from different platforms. Existing cross-platform recommendation approaches assume all cross-platform information to be consistent with each other and can be aligned. However, there remain two unsolved challenges: i) there exist inconsistencies in cross-platform association due to platform-specific disparity, and ii) data from distinct platforms may have different semantic granularities. In this paper, we propose a cross-platform association model for cross-platform video recommendation, i.e., Disparity-preserved Deep Cross-platform Association (DCA), taking platform-specific disparity and granularity difference into consideration. The proposed DCA model employs a partially-connected multi-modal autoencoder, which is capable of explicitly capturing platform-specific information, as well as utilizing nonlinear mapping functions to handle granularity differences. We then present a cross-platform video recommendation approach based on the proposed DCA model. Extensive experiments for our cross-platform recommendation framework on real-world dataset demonstrate that the proposed DCA model significantly outperform existing cross-platform recommendation methods in terms of various evaluation metrics.