| Total: 58
Sufficient physical activity and restful sleep play a major role in the prevention and cure of many chronic conditions. Being able to proactively screen and monitor such chronic conditions would be a big step forward for overall health. The rapid increase in the popularity of wearable devices pro-vides a significant new source, making it possible to track the user’s lifestyle real-time. In this paper, we propose a novel unsupervised representation learning technique called activ-ity2vecthat learns and “summarizes” the discrete-valued ac-tivity time-series. It learns the representations with three com-ponents: (i) the co-occurrence and magnitude of the activ-ity levels in a time-segment, (ii) neighboring context of the time-segment, and (iii) promoting subject-invariance with ad-versarial training. We evaluate our method on four disorder prediction tasks using linear classifiers. Empirical evaluation demonstrates that our proposed method scales and performs better than many strong baselines. The adversarial regime helps improve the generalizability of our representations by promoting subject invariant features. We also show that using the representations at the level of a day works the best since human activity is structured in terms of daily routines.
Deep learning based automatic feature extraction methods have radically transformed speaker identification and facial recognition. Current approaches are typically specialized for individual domains, such as Deep Vectors (D-Vectors) for speaker identification. We provide two distinct contributions: a generalized framework for biometric verification inspired by D-Vectors and novel models that outperform current stateof-the-art approaches. Our approach supports substitution of various feature extraction models and improves the robustness of verification tests across domains. We demonstrate the framework and models for two different behavioral biometric verification problems: keystroke and mobile gait. We present a comprehensive empirical analysis comparing our framework to the state-of-the-art in both domains. Our models perform verification with higher accuracy using orders of magnitude less data than state-of-the-art approaches in both domains. We believe that the combination of high accuracy and practical data requirements will enable application of behavioral biometric models outside of the laboratory in support of much-needed improvements to cyber security.
Thoroughly understanding how energy consumption is disaggregated into individual appliances can help reduce household expenses, integrate renewable sources of energy, and lead to efficient use of energy. In this work, we propose a deep latent generative model based on variational recurrent neural networks (VRNNs) for energy disaggregation. Our model jointly disaggregates the aggregated energy signal into individual appliance signals, achieving superior performance when compared to the state-of-the-art models for energy disaggregation, yielding a 29% and 41% performance improvement on two energy datasets, respectively, without explicitly encoding temporal/contextual information or heuristics. Our model also achieves better prediction performance on lowpower appliances, paving the way for a more nuanced disaggregation model. The structured output prediction in our model helps in accurately discerning which appliance(s) contribute to the aggregated power consumption, thus providing a more useful and meaningful disaggregation model.
One dimension of modernist poetry is introducing entities in surprising contexts, such as wheelbarrow in Bob Dylan’s feel like falling in love with the first woman I meet/ putting her in a wheelbarrow. This paper considers the problem of teaching a neural language model to select poetic entities, based on local context windows. We do so by fine-tuning and evaluating language models on the poetry of American modernists, both on seen and unseen poets, and across a range of experimental designs. We also compare the performance of our poetic language model to human, professional poets. Our main finding is that, perhaps surprisingly, modernist poetry differs most from ordinary language when entities are concrete, like wheelbarrow, and while our fine-tuning strategy successfully adapts to poetic language in general, outperforming professional poets, the biggest error reduction is observed with concrete entities.
This paper presents a novel unsupervised domain adaptation framework, called Synergistic Image and Feature Adaptation (SIFA), to effectively tackle the problem of domain shift. Domain adaptation has become an important and hot topic in recent studies on deep learning, aiming to recover performance degradation when applying the neural networks to new testing domains. Our proposed SIFA is an elegant learning diagram which presents synergistic fusion of adaptations from both image and feature perspectives. In particular, we simultaneously transform the appearance of images across domains and enhance domain-invariance of the extracted features towards the segmentation task. The feature encoder layers are shared by both perspectives to grasp their mutual benefits during the end-to-end learning procedure. Without using any annotation from the target domain, the learning of our unified model is guided by adversarial losses, with multiple discriminators employed from various aspects. We have extensively validated our method with a challenging application of crossmodality medical image segmentation of cardiac structures. Experimental results demonstrate that our SIFA model recovers the degraded performance from 17.2% to 73.0%, and outperforms the state-of-the-art methods by a significant margin.
Massive amount of spatio-temporal data that contain location and text content are being generated by location-based social media. These spatio-temporal messages cover a wide range of topics. It is of great significance to discover local trending topics based on users’ location-based and topicbased requirements. We develop a region-based message exploration mechanism that retrieve spatio-temporal message clusters from a stream of spatio-temporal messages based on users’ preferences on message topic and message spatial distribution. Additionally, we propose a region summarization algorithm that finds a subset of representative messages in a cluster to summarize the topics and the spatial attributes of messages in the cluster. We evaluate the efficacy and efficiency of our proposal on two real-world datasets and the results demonstrate that our solution is capable of high efficiency and effectiveness compared with baselines.
Sparse reward games, such as the infamous Montezuma’s Revenge, pose a significant challenge for Reinforcement Learning (RL) agents. Hierarchical RL, which promotes efficient exploration via subgoals, has shown promise in these games. However, existing agents rely either on human domain knowledge or slow autonomous methods to derive suitable subgoals. In this work, we describe a new, autonomous approach for deriving subgoals from raw pixels that is more efficient than competing methods. We propose a novel intrinsic reward scheme for exploiting the derived subgoals, applying it to three Atari games with sparse rewards. Our agent’s performance is comparable to that of state-of-the-art methods, demonstrating the usefulness of the subgoals found.
Graph convolutional neural networks (GCNN) have become an increasingly active field of research. It models the spatial dependencies of nodes in a graph with a pre-defined Laplacian matrix based on node distances. However, in many application scenarios, spatial dependencies change over time, and the use of fixed Laplacian matrix cannot capture the change. To track the spatial dependencies among traffic data, we propose a dynamic spatio-temporal GCNN for accurate traffic forecasting. The core of our deep learning framework is the finding of the change of Laplacian matrix with a dynamic Laplacian matrix estimator. To enable timely learning with a low complexity, we creatively incorporate tensor decomposition into the deep learning framework, where real-time traffic data are decomposed into a global component that is stable and depends on long-term temporal-spatial traffic relationship and a local component that captures the traffic fluctuations. We propose a novel design to estimate the dynamic Laplacian matrix of the graph with above two components based on our theoretical derivation, and introduce our design basis. The forecasting performance is evaluated with two realtime traffic datasets. Experiment results demonstrate that our network can achieve up to 25% accuracy improvement.
Human-object interactions (HOI) recognition and pose estimation are two closely related tasks. Human pose is an essential cue for recognizing actions and localizing the interacted objects. Meanwhile, human action and their interacted objects’ localizations provide guidance for pose estimation. In this paper, we propose a turbo learning framework to perform HOI recognition and pose estimation simultaneously. First, two modules are designed to enforce message passing between the tasks, i.e. pose aware HOI recognition module and HOI guided pose estimation module. Then, these two modules form a closed loop to utilize the complementary information iteratively, which can be trained in an end-to-end manner. The proposed method achieves the state-of-the-art performance on two public benchmarks including Verbs in COCO (V-COCO) and HICO-DET datasets.
Urban regions are places where people live, work, consume, and entertain. In this study, we investigate the problem of learning an embedding space for regions. Studying the representations of regions can help us to better understand the patterns, structures, and dynamics of cities, support urban planning, and, ultimately, to make our cities more livable and sustainable. While some efforts have been made for learning the embeddings of regions, existing methods can be improved by incorporating locality-constrained spatial autocorrelations into an encode-decode framework. Such embedding strategy is capable of taking into account both intra-region structural information and inter-region spatial autocorrelations. To this end, we propose to learn the representations of regions via a new embedding strategy with awareness of locality-constrained spatial autocorrelations. Specifically, we first construct multi-view (i.e., distance and mobility connectivity) POI-POI networks to represent regions. In addition, we introduce two properties into region embedding: (i) spatial autocorrelations: a global similarity between regions; (ii) top-k locality: spatial autocorrelations locally and approximately reside on top k most autocorrelated regions. We propose a new encoder-decoder based formulation that preserves the two properties while remaining efficient. As an application, we exploit the learned embeddings to predict the mobile checkin popularity of regions. Finally, extensive experiments with real-world urban region data demonstrate the effectiveness and efficiency of our method.
A smart grid is an efficient and sustainable energy system that integrates diverse generation entities, distributed storage capacity, and smart appliances and buildings. A smart grid brings new kinds of participants in the energy market served by it, whose effect on the grid can only be determined through high fidelity simulations. Power TAC offers one such simulation platform using real-world weather data and complex state-of-the-art customer models. In Power TAC, autonomous energy brokers compete to make profits across tariff, wholesale and balancing markets while maintaining the stability of the grid. In this paper, we design an autonomous broker VidyutVanika, the runner-up in the 2018 Power TAC competition. VidyutVanika relies on reinforcement learning (RL) in the tariff market and dynamic programming in the wholesale market to solve modified versions of known Markov Decision Process (MDP) formulations in the respective markets. The novelty lies in defining the reward functions for MDPs, solving these MDPs, and the application of these solutions to real actions in the market. Unlike previous participating agents, VidyutVanika uses a neural network to predict the energy consumption of various customers using weather data. We use several heuristic ideas to bridge the gap between the restricted action spaces of the MDPs and the much more extensive action space available to VidyutVanika. These heuristics allow VidyutVanika to convert near-optimal fixed tariffs to time-of-use tariffs aimed at mitigating transmission capacity fees, spread out its orders across several auctions in the wholesale market to procure energy at a lower price, more accurately estimate parameters required for implementing the MDP solution in the wholesale market, and account for wholesale procurement costs while optimizing tariffs. We use Power TAC 2018 tournament data and controlled experiments to analyze the performance of VidyutVanika, and illustrate the efficacy of the above strategies.
Forecasting the traffic flows is a critical issue for researchers and practitioners in the field of transportation. However, it is very challenging since the traffic flows usually show high nonlinearities and complex patterns. Most existing traffic flow prediction methods, lacking abilities of modeling the dynamic spatial-temporal correlations of traffic data, thus cannot yield satisfactory prediction results. In this paper, we propose a novel attention based spatial-temporal graph convolutional network (ASTGCN) model to solve traffic flow forecasting problem. ASTGCN mainly consists of three independent components to respectively model three temporal properties of traffic flows, i.e., recent, daily-periodic and weekly-periodic dependencies. More specifically, each component contains two major parts: 1) the spatial-temporal attention mechanism to effectively capture the dynamic spatialtemporal correlations in traffic data; 2) the spatial-temporal convolution which simultaneously employs graph convolutions to capture the spatial patterns and common standard convolutions to describe the temporal features. The output of the three components are weighted fused to generate the final prediction results. Experiments on two real-world datasets from the Caltrans Performance Measurement System (PeMS) demonstrate that the proposed ASTGCN model outperforms the state-of-the-art baselines.
Novice programmers often struggle with the formal syntax of programming languages. In the traditional classroom setting, they can make progress with the help of real time feedback from their instructors which is often impossible to get in the massive open online course (MOOC) setting. Syntactic error repair techniques have huge potential to assist them at scale. Towards this, we design a novel programming language correction framework amenable to reinforcement learning. The framework allows an agent to mimic human actions for text navigation and editing. We demonstrate that the agent can be trained through self-exploration directly from the raw input, that is, program text itself, without either supervision or any prior knowledge of the formal syntax of the programming language. We evaluate our technique on a publicly available dataset containing 6975 erroneous C programs with typographic errors, written by students during an introductory programming course. Our technique fixes 1699 (24.4%) programs completely and 1310 (18.8%) program partially, outperforming DeepFix, a state-of-the-art syntactic error repair technique, which uses a fully supervised neural machine translation approach.
Despite the great success of word embedding, sentence embedding remains a not-well-solved problem. In this paper, we present a supervised learning framework to exploit sentence embedding for the medical question answering task. The learning framework consists of two main parts: 1) a sentence embedding producing module, and 2) a scoring module. The former is developed with contextual self-attention and multi-scale techniques to encode a sentence into an embedding tensor. This module is shortly called Contextual self-Attention Multi-scale Sentence Embedding (CAMSE). The latter employs two scoring strategies: Semantic Matching Scoring (SMS) and Semantic Association Scoring (SAS). SMS measures similarity while SAS captures association between sentence pairs: a medical question concatenated with a candidate choice, and a piece of corresponding supportive evidence. The proposed framework is examined by two Medical Question Answering(MedicalQA) datasets which are collected from real-world applications: medical exam and clinical diagnosis based on electronic medical records (EMR). The comparison results show that our proposed framework achieved significant improvements compared to competitive baseline approaches. Additionally, a series of controlled experiments are also conducted to illustrate that the multi-scale strategy and the contextual self-attention layer play important roles for producing effective sentence embedding, and the two kinds of scoring strategies are highly complementary to each other for question answering problems.
As one of the major frauds in financial services, cash-out fraud is that users pursue cash gains with illegal or insincere means. Conventional solutions for the cash-out user detection are to perform subtle feature engineering for each user and then apply a classifier, such as GDBT and Neural Network. However, users in financial services have rich interaction relations, which are seldom fully exploited by conventional solutions. In this paper, with the real datasets in Ant Credit Pay of Ant Financial Services Group, we first study the cashout user detection problem and propose a novel hierarchical attention mechanism based cash-out user detection model, called HACUD. Specifically, we model different types of objects and their rich attributes and interaction relations in the scenario of credit payment service with an Attributed Heterogeneous Information Network (AHIN). The HACUD model enhances feature representation of objects through meta-path based neighbors exploiting different aspects of structure information in AHIN. Furthermore, a hierarchical attention mechanism is elaborately designed to model user’s preferences towards attributes and meta-paths. Experimental results on two real datasets show that the HACUD outperforms the state-of-the-art methods.
Deep reinforcement learning (DRL) has achieved surpassing human performance on Atari games, using raw pixels and rewards to learn everything. However, first-person-shooter (FPS) games in 3D environments contain higher levels of human concepts (enemy, weapon, spatial structure, etc.) and a large action space. In this paper, we explore a novel method which can plan on temporally-extended action sequences, which we refer as Combo-Action to compress the action space. We further train a deep recurrent Q-learning network model as a high-level controller, called supervisory network, to manage the Combo-Actions. Our method can be boosted with auxiliary tasks (enemy detection and depth prediction), which enable the agent to extract high-level concepts in the FPS games. Extensive experiments show that our method is efficient in training process and outperforms previous stateof-the-art approaches by a large margin. Ablation study experiments also indicate that our method can boost the performance of the FPS agent in a reasonable way.
While deep learning models have achieved unprecedented success in various domains, there is also a growing concern of adversarial attacks against related applications. Recent results show that by adding a small amount of perturbations to an image (imperceptible to humans), the resulting adversarial examples can force a classifier to make targeted mistakes. So far, most existing works focus on crafting adversarial examples in the digital domain, while limited efforts have been devoted to understanding the physical domain attacks. In this work, we explore the feasibility of generating robust adversarial examples that remain effective in the physical domain. Our core idea is to use an image-to-image translation network to simulate the digital-to-physical transformation process for generating robust adversarial examples. To validate our method, we conduct a large-scale physical-domain experiment, which involves manually taking more than 3000 physical domain photos. The results show that our method outperforms existing ones by a large margin and demonstrates a high level of robustness and transferability.
This paper introduces a new type of Security Games (SG) played on a plane with targets moving along predefined straight line trajectories and its respective Mixed Integer Linear Programming (MILP) formulation. Three approaches for solving the game are proposed and experimentally evaluated: application of an MILP solver to finding exact solutions for small-size games, MILP-based extension of recently published zero-sum SG approach to the case of generalsum games for finding approximate solutions of medium-size games, and the use of Memetic Algorithm (MA) for mediumsize and large-size game instances, which are beyond MILP’s scalability. Utilization of MA is, to the best of our knowledge, a new idea in the field of SG. The novelty of proposed solution lies specifically in efficient chromosome-based game encoding and dedicated local improvement heuristics. In vast majority of test cases with known equilibrium profiles, the method leads to optimal solutions with high stability and approximately linear time scalability. Another advantage is an iteration-based construction of the system, which makes the approach essentially an anytime method. This property is of paramount importance in case of restrictive time limits, which could hinder the possibility of calculating an exact solution. On a general note, we believe that MA-based methods may offer a viable alternative to MILP solvers for complex games that require application of approximate solving methods.
Developing a computer vision-based algorithm for identifying dangerous vehicles requires a large amount of labeled accident data, which is difficult to collect in the real world. To tackle this challenge, we first develop a synthetic data generator built on top of a driving simulator. We then observe that the synthetic labels that are generated based on simulation results are very noisy, resulting in poor classification performance. In order to improve the quality of synthetic labels, we propose a new label adaptation technique that first extracts internal states of vehicles from the underlying driving simulator, and then refines labels by predicting future paths of vehicles based on a well-studied motion model. Via real-data experiments, we show that our dangerous vehicle classifier can reduce the missed detection rate by at least 18.5% compared with those trained with real data when time-to-collision is between 1.6s and 1.8s.
Taking speed reports from vehicles is a proven, inexpensive way to infer traffic conditions. However, due to concerns about privacy and bandwidth, not every vehicle occupant may want to transmit data about their location and speed in real time. We show how to drastically reduce the number of transmissions in two ways, both based on a Markov random field for modeling traffic speed and flow. First, we show that a only a small number of vehicles need to report from each location. We give a simple, probabilistic method that lets a group of vehicles decide on which subset will transmit a report, preserving privacy by coordinating without any communication. The second approach computes the potential value of any location’s speed report, emphasizing those reports that will most affect the overall speed inferences, and omitting those that contribute little value. Both methods significantly reduce the amount of communication necessary for accurate speed inferences on a road network.
Nowadays, it is common for one natural person to join multiple social networks to enjoy different kinds of services. Linking identical users across multiple social networks, also known as social network alignment, is an important problem of great research challenges. Existing methods usually link social identities on the pairwise sample level, which may lead to undesirable performance when the number of available annotations is limited. Motivated by the isomorphism information, in this paper we consider all the identities in a social network as a whole and perform social network alignment from the distribution level. The insight is that we aim to learn a projection function to not only minimize the distance between the distributions of user identities in two social networks, but also incorporate the available annotations as the learning guidance. We propose three models SNNAu, SNNAb and SNNAo to learn the projection function under the weakly-supervised adversarial learning framework. Empirically, we evaluate the proposed models over multiple datasets, and the results demonstrate the superiority of our proposals.
Bike-sharing systems, aiming at meeting the public’s need for ”last mile” transportation, are becoming popular in recent years. With an accurate demand prediction model, shared bikes, though with a limited amount, can be effectively utilized whenever and wherever there are travel demands. Despite that some deep learning methods, especially long shortterm memory neural networks (LSTMs), can improve the performance of traditional demand prediction methods only based on temporal representation, such improvement is limited due to a lack of mining complex spatial-temporal relations. To address this issue, we proposed a novel model named STG2Vec to learn the representation from heterogeneous spatial-temporal graph. Specifically, we developed an event-flow serializing method to encode the evolution of dynamic heterogeneous graph into a special language pattern such as word sequence in a corpus. Furthermore, a dynamic attention-based graph embedding model is introduced to obtain an importance-awareness vectorized representation of the event flow. Additionally, together with other multi-source information such as geographical position, historical transition patterns and weather, e.g., the representation learned by STG2Vec can be fed into the LSTMs for temporal modeling. Experimental results from Citi-Bike electronic usage records dataset in New York City have illustrated that the proposed model can achieve competitive prediction performance compared with its variants and other baseline models.
Generative Adversarial Networks (GANs) are powerful tools for reconstructing Compressed Sensing Magnetic Resonance Imaging (CS-MRI). However most recent works lack exploration of structure information of MRI images that is crucial for clinical diagnosis. To tackle this problem, we propose the Structure-Enhanced GAN (SEGAN) that aims at restoring structure information at both local and global scale. SEGAN defines a new structure regularization called Patch Correlation Regularization (PCR) which allows for efficient extraction of structure information. In addition, to further enhance the ability to uncover structure information, we propose a novel generator SU-Net by incorporating multiple-scale convolution filters into each layer. Besides, we theoretically analyze the convergence of stochastic factors contained in training process. Experimental results show that SEGAN is able to learn target structure information and achieves state-of-theart performance for CS-MRI reconstruction.
Crowd flow prediction is of great importance in a wide range of applications from urban planning, traffic control to public safety. It aims to predict the inflow (the traffic of crowds entering a region in a given time interval) and outflow (the traffic of crowds leaving a region for other places) of each region in the city with knowing the historical flow data. In this paper, we propose DeepSTN+, a deep learning-based convolutional model, to predict crowd flows in the metropolis. First, DeepSTN+ employs the ConvPlus structure to model the longrange spatial dependence among crowd flows in different regions. Further, PoI distributions and time factor are combined to express the effect of location attributes to introduce prior knowledge of the crowd movements. Finally, we propose an effective fusion mechanism to stabilize the training process, which further improves the performance. Extensive experimental results based on two real-life datasets demonstrate the superiority of our model, i.e., DeepSTN+ reduces the error of the crowd flow prediction by approximately 8%∼13% compared with the state-of-the-art baselines.
Deep neural networks (DNNs) are vulnerable to adversarial examples where inputs with imperceptible perturbations mislead DNNs to incorrect results. Recently, adversarial patch, with noise confined to a small and localized patch, emerged for its easy accessibility in real-world. However, existing attack strategies are still far from generating visually natural patches with strong attacking ability, since they often ignore the perceptual sensitivity of the attacked network to the adversarial patch, including both the correlations with the image context and the visual attention. To address this problem, this paper proposes a perceptual-sensitive generative adversarial network (PS-GAN) that can simultaneously enhance the visual fidelity and the attacking ability for the adversarial patch. To improve the visual fidelity, we treat the patch generation as a patch-to-patch translation via an adversarial process, feeding any types of seed patch and outputting the similar adversarial patch with high perceptual correlation with the attacked image. To further enhance the attacking ability, an attention mechanism coupled with adversarial generation is introduced to predict the critical attacking areas for placing the patches, which can help producing more realistic and aggressive patches. Extensive experiments under semi-whitebox and black-box settings on two large-scale datasets GTSRB and ImageNet demonstrate that the proposed PS-GAN outperforms state-of-the-art adversarial patch attack methods.