| Total: 94
Metric clustering is a fundamental primitive in machine learning with several applications for mining massive datasets. An important example of metric clustering is the k-center problem. While this problem has been extensively studied in distributed settings, all previous algorithms use Ω(k) space per machine and Ω(n k) total work. In this paper, we develop the first highly scalable approximation algorithm for k-center clustering, with O~(n^ε) space per machine and O~(n^(1+ε)) total work, for arbitrary small constant ε. It produces an O(log log log n)-approximate solution with k(1+o(1)) centers in O(log log n) rounds of computation.
Graph neural networks (GNNs) have been proven to be effective in various network-related tasks. Most existing GNNs usually exploit the low-frequency signals of node features, which gives rise to one fundamental question: is the low-frequency information all we need in the real world applications? In this paper, we first present an experimental investigation assessing the roles of low-frequency and high-frequency signals, where the results clearly show that exploring low-frequency signal only is distant from learning an effective node representation in different scenarios. How can we adaptively learn more information beyond low-frequency information in GNNs? A well-informed answer can help GNNs enhance the adaptability. We tackle this challenge and propose a novel Frequency Adaptation Graph Convolutional Networks (FAGCN) with a self-gating mechanism, which can adaptively integrate different signals in the process of message passing. For a deeper understanding, we theoretically analyze the roles of low-frequency signals and high-frequency signals on learning node representations, which further explains why FAGCN can perform well on different types of networks. Extensive experiments on six real-world networks validate that FAGCN not only alleviates the over-smoothing problem, but also has advantages over the state-of-the-arts.
Traditional studies on recommender systems usually leverage only one type of user behaviors (the optimization target, such as purchase), despite the fact that users also generate a large number of various types of interaction data (e.g., view, click, add-to-cart, etc). Generally, these heterogeneous multi-relational data provide well-structured information and can be used for high-quality recommendation. Early efforts towards leveraging these heterogeneous data fail to capture the high-hop structure of user-item interactions, which are unable to make full use of them and may only achieve constrained recommendation performance. In this work, we propose a new multi-relational recommendation model named Graph Heterogeneous Collaborative Filtering (GHCF). To explore the high-hop heterogeneous user-item interactions, we take the advantages of Graph Convolutional Network (GCN) and further improve it to jointly embed both representations of nodes (users and items) and relations for multi-relational prediction. Moreover, to fully utilize the whole heterogeneous data, we perform the advanced efficient non-sampling optimization under a multi-task learning framework. Experimental results on two public benchmarks show that GHCF significantly outperforms the state-of-the-art recommendation methods, especially for cold-start users who have few primary item interactions. Further analysis verifies the importance of the proposed embedding propagation for modelling high-hop heterogeneous user-item interactions, showing the rationality and effectiveness of GHCF. Our implementation has been released (https://github.com/chenchongthu/GHCF).
Ad creatives are one of the prominent mediums for online e-commerce advertisements. Ad creatives with enjoyable visual appearance may increase the click-through rate (CTR) of products. Ad creatives are typically handcrafted by advertisers and then delivered to the advertising platforms for advertisement. In recent years, advertising platforms are capable of instantly compositing ad creatives with arbitrarily designated elements of each ingredient, so advertisers are only required to provide basic materials. While facilitating the advertisers, a great number of potential ad creatives can be composited, making it difficult to accurately estimate CTR for them given limited real-time feedback. To this end, we propose an Adaptive and Efficient ad creative Selection (AES) framework based on a tree structure. The tree structure on compositing ingredients enables dynamic programming for efficient ad creative selection on the basis of CTR. Due to limited feedback, the CTR estimator is usually of high variance. Exploration techniques based on Thompson sampling are widely used for reducing variances of the CTR estimator, alleviating feedback sparsity. Based on the tree structure, Thompson sampling is adapted with dynamic programming, leading to efficient exploration for potential ad creatives with the largest CTR. We finally evaluate the proposed algorithm on the synthetic dataset and the real-world dataset. The results show that our approach can outperform competing baselines in terms of convergence rate and overall CTR.
Dynamic load balancing lies at the heart of distributed caching. Here, the goal is to assign objects (load) to servers (computing nodes) in a way that provides load balancing while at the same time dynamically adjusts to the addition or removal of servers. Load balancing is a critical topic in many areas including cloud systems, distributed databases, and distributed and data-parallel machine learning. A popular and widely adopted solution to dynamic load balancing is the two-decade-old Consistent Hashing (CH). Recently, an elegant extension was provided to account for server bounds. In this paper, we identify that existing methodologies for CH and its variants suffer from cascaded overflow, leading to poor load balancing. This cascading effect leads to decreasing performance of the hashing procedure with increasing load. To overcome the cascading effect, we propose a simple solution to CH based on recent advances in fast minwise hashing. We show, both theoretically and empirically, that our proposed solution is significantly superior for load balancing and is optimal in many senses. On the AOL search dataset and Indiana University Clicks dataset with real user activity, our proposed solution reduces cache misses by several magnitudes.
Sequential recommender systems (SRS) have become a research hotspot in recent studies. Because of the requirement in capturing user's dynamic interests, sequential neural network based recommender models often need to be stacked with more hidden layers (e.g., up to 100 layers) compared with standard collaborative filtering methods. However, the high network latency has become the main obstacle when deploying very deep recommender models into a production environment. In this paper, we argue that the typical prediction framework that treats all users equally during the inference phase is inefficient in running time, as well as sub-optimal in accuracy. To resolve such an issue, we present SkipRec, an adaptive inference framework by learning to skip inactive hidden layers on a per-user basis. Specifically, we devise a policy network to automatically determine which layers should be retained and which layers are allowed to be skipped, so as to achieve user-specific decisions. To derive the optimal skipping policy, we propose using gumbel softmax and reinforcement learning to solve the non-differentiable problem during backpropagation. We perform extensive experiments on three real-world recommendation datasets, and demonstrate that SkipRec attains comparable or better accuracy with much less inference time.
Single-table text-to-SQL aims to transform a natural language question into a SQL query according to one single table. Recent work has made promising progress on this task by pre-trained language models and a multi-submodule framework. However, zero-shot table, that is, the invisible table in the training set, is currently the most critical bottleneck restricting the application of existing approaches to real-world scenarios. Although some work has utilized auxiliary tasks to help handle zero-shot tables, expensive extra manual annotation limits their practicality. In this paper, we propose a new approach for the zero-shot text-to-SQL task which does not rely on any additional manual annotations. Our approach consists of two parts. First, we propose a new model that leverages the abundant information of table content to help establish the mapping between questions and zero-shot tables. Further, we propose a simple but efficient meta-learning strategy to train our model. The strategy utilizes the two-step gradient update to force the model to learn a generalization ability towards zero-shot tables. We conduct extensive experiments on a public open-domain text-to-SQL dataset WikiSQL and a domain-specific dataset ESQL. Compared to existing approaches using the same pre-trained model, our approach achieves significant improvements on both datasets. Compared to the larger pre-trained model and the tabular-specific pre-trained model, our approach is still competitive. More importantly, on the zero-shot subsets of both the datasets, our approach further increases the improvements.
For personalized recommendations, collaborative filtering (CF) methods aim to recommend items to users based on data of historical user-item interactions. Deep learning has indicated success in improving performance of CF methods in recent works. However, to generate an item recommendation list for each user, a lot of deep learning based CF methods require every pair of users and items to be passed through multiple neural layers. This requires intensive computation and makes real-time end-to-end neural recommendations very costly. To address this issue, in this paper, we propose a new deep learning-based hierarchical decision network to filter out irrelevant items to save computation cost while maintaining good recommendation accuracy of deep CF methods. We also develop a distillation-based training algorithm, which uses a well-trained CF model as a teacher network to guide the training of the decision network. We conducted extensive experiments on real-world benchmark datasets to verify the effectiveness of efficiency of our decision network for making recommendations. The experimental results indicate that the proposed decision network is able to maintain or even improve the recommendation quality in terms of various metrics and meanwhile enjoy lower computational cost.
Tensor decomposition is one of the most effective techniques for multi-criteria recommendations. However, it suffers from data sparsity when dealing with three-dimensional (3D) user-item-criterion ratings. To mitigate this issue, we consider effectively incorporating the side information and cross-domain knowledge in tensor decomposition. A deep transfer tensor decomposition (DTTD) method is proposed by integrating deep structure and Tucker decomposition, where an orthogonal constrained stacked denoising autoencoder (OC-SDAE) is proposed for alleviating the scale variation in learning effective latent representation, and the side information is incorporated as a compensation for tensor sparsity. Tucker decomposition generates private users and items' latent factors to connect with OC-SDAEs and creates a common core tensor to bridge different domains. A cross-domain alignment algorithm (CDAA) is proposed to solve the rotation issue between two core tensors in source and target domain. To the best of our knowledge, this is the first work in Tucker decomposition based recommendations to use deep structure to incorporate the side information and cross-domain knowledge. Experiments show that DTTD outperforms state-of-the-art related works.
In this paper, we study the problem of embedding uncertain knowledge graphs, where each relation between entities is associated with a confidence score. Observing the existing embedding methods may discard the uncertainty information, only incorporate a specific type of score function, or cause many false-negative samples in the training, we propose the PASSLEAF framework to solve the above issues. PASSLEAF consists of two parts, one is a model that can incorporate different types of scoring functions to predict the relation confidence scores and the other is the semi-supervised learning model by exploiting both positive and negative samples associated with the estimated confidence scores. Furthermore, PASSLEAF leverages a sample pool as a relay of generated samples to further augment the semi-supervised learning. Experiment results show that our proposed framework can learn better embedding in terms of having higher accuracy in both the confidence score prediction and tail entity prediction.
Given high-dimensional time series data (e.g., sensor data), how can we detect anomalous events, such as system faults and attacks? More challengingly, how can we do this in a way that captures complex inter-sensor relationships, and detects and explains anomalies which deviate from these relationships? Recently, deep learning approaches have enabled improvements in anomaly detection in high-dimensional datasets; however, existing methods do not explicitly learn the structure of existing relationships between variables, or use them to predict the expected behavior of time series. Our approach combines a structure learning approach with graph neural networks, additionally using attention weights to provide explainability for the detected anomalies. Experiments on two real-world sensor datasets with ground truth anomalies show that our method detects anomalies more accurately than baseline approaches, accurately captures correlations between sensors, and allows users to deduce the root cause of a detected anomaly.
The interactive recommender systems involve users in the recommendation procedure by receiving timely user feedback to update the recommendation policy. Therefore, they are widely used in real application scenarios. Previous interactive recommendation methods primarily focus on learning users' personalized preferences on the relevance properties of an item set. However, the investigation of users' personalized preferences on the diversity properties of an item set is usually ignored. To overcome this problem, we propose the Linear Modular Dispersion Bandit (LMDB) framework, which is an online learning setting for optimizing a combination of modular functions and dispersion functions. Specifically, LMDB employs modular functions to model the relevance properties of each item, and dispersion functions to describe the diversity properties of an item set. Moreover, we also develop a learning algorithm, called Linear Modular Dispersion Hybrid (LMDH) to solve the LMDB problem and derive a gap-free bound on its n-step regret. Extensive experiments on real datasets are performed to demonstrate the effectiveness of the proposed LMDB framework in balancing the recommendation accuracy and diversity.
We consider a natural setting where network parameters are estimated from noisy and incomplete information about the network. More specifically, we investigate how we can efficiently estimate the number of small subgraphs (e.g., edges, triangles, etc.) based on full access to one or two noisy and incomplete samples of a large underlying network and on few queries revealing the neighborhood of carefully selected vertices. After specifying a random generator which removes edges from the underlying graph, we present estimators with strong provable performance guarantees, which exploit information from the noisy network samples and query a constant number of the most important vertices for the estimation. Our experimental evaluation shows that, in practice, a single noisy network sample and a couple of hundreds neighborhood queries suffice for accurately estimating the number of triangles in networks with millions of vertices and edges.
Although static networks have been extensively studied in machine learning, data mining, and AI communities for many decades, the study of dynamic networks has recently taken center stage due to the prominence of social media and its effects on the dynamics of social networks. In this paper, we propose a statistical model for dynamically evolving networks, together with a variational inference approach. Our model, Neural Latent Space Model with Variational Inference, encodes edge dependencies across different time snapshots. It represents nodes via latent vectors and uses interaction matrices to model the presence of edges. These matrices can be used to incorporate multiple relations in heterogeneous networks by having a separate matrix for each of the relations. To capture the temporal dynamics, both node vectors and interaction matrices are allowed to evolve with time. Existing network analysis methods use representation learning techniques for modelling networks. These techniques are different for homogeneous and heterogeneous networks because heterogeneous networks can have multiple types of edges and nodes as opposed to a homogeneous network. Unlike these, we propose a unified model for homogeneous and heterogeneous networks in a variational inference framework. Moreover, the learned node latent vectors and interaction matrices may be interpretable and therefore provide insights on the mechanisms behind network evolution. We experimented with a single step and multi-step link forecasting on real-world networks of homogeneous, bipartite, and heterogeneous nature, and demonstrated that our model significantly outperforms existing models.
User modeling is critical for developing personalized services in industry. A common way for user modeling is to learn user representations that can be distinguished by their interests or preferences. In this work, we focus on developing universal user representation model. The obtained universal representations are expected to contain rich information, and be applicable to various downstream applications without further modifications (e.g., user preference prediction and user profiling). Accordingly, we can be free from the heavy work of training task-specific models for every downstream task as in previous works. In specific, we propose Self-supervised User Modeling Network (SUMN) to encode behavior data into the universal representation. It includes two key components. The first one is a new learning objective, which guides the model to fully identify and preserve valuable user information under a self-supervised learning framework. The other one is a multi-hop aggregation layer, which benefits the model capacity in aggregating diverse behaviors. Extensive experiments on benchmark datasets show that our approach can outperform state-of-the-art unsupervised representation methods, and even compete with supervised ones.
Match outcome prediction in group comparison setting is a challenging but important task. Existing works mainly focus on learning individual effects or mining limited interactions between teammates, which is not sufficient for capturing complex interactions between teammates as well as between opponents. Besides, the importance of interacting with different characters is still largely underexplored. To this end, we propose a novel Neural Attentional Cooperation-competition model (NeuralAC), which incorporates weighted-cooperation effects (i.e., intra-team interactions) and weighted-competition effects (i.e., inter-team interactions) for predicting match outcomes. Specifically, we first project individuals to latent vectors and learn complex interactions through deep neural networks. Then, we design two novel attention-based mechanisms to capture the importance of intra-team and inter-team interactions, which enhance NeuralAC with both accuracy and interpretability. Furthermore, we demonstrate NeuralAC can generalize several previous works. To evaluate the performances of NeuralAC, we conduct extensive experiments on four E-sports datasets. The experimental results clearly verify the effectiveness of NeuralAC compared with several state-of-the-art methods.
Accurate and timely air quality and weather predictions are of great importance to urban governance and human livelihood. Though many efforts have been made for air quality or weather prediction, most of them simply employ one another as feature input, which ignores the inner-connection between two predictive tasks. On the one hand, the accurate prediction of one task can help improve another task's performance. On the other hand, geospatially distributed air quality and weather monitoring stations provide additional hints for city-wide spatiotemporal dependency modeling. Inspired by the above two insights, in this paper, we propose the Multi-adversarial spatiotemporal recurrent Graph Neural Networks (MasterGNN) for joint air quality and weather prediction. Specifically, we first propose a heterogeneous recurrent graph neural network to model the spatiotemporal autocorrelation among air quality and weather monitoring stations. Then, we develop a multi-adversarial graph learning framework to against observation noise propagation introduced by spatiotemporal modeling. Moreover, we introduce an adaptive training strategy by formulating multi-adversarial learning as a multi-task learning problem. Finally, extensive experiments on two real-world datasets show that MasterGNN achieves the best performance compared with seven baselines on both air quality and weather prediction tasks.
When formulated as an unsupervised learning problem, anomaly detection often requires a model to learn the distribution of normal data. Previous works modify Generative Adversarial Networks (GANs) by using encoder-decoders as generators and apply them to anomaly detection tasks. Previous studies indicate that GAN ensembles are often more stable than single GANs in image generation tasks. In this work, we propose to construct GAN ensembles for anomaly detection. In the proposed method, a group of generators interact with a group of discriminators, so every generator gets feedback from every discriminator, and vice versa. Compared to a single GAN, an ensemble of GANs can better model the distribution of normal data and thus better detect anomalies. We also make a theoretical analysis of GANs and GAN ensembles in the context of anomaly detection. The empirical study constructs ensembles based on four different types of detecting models, and the results show that the ensemble outperforms the single model for all four model types.
Using temporal abstraction, various forms of sampled multivariate temporal data can be transformed into a uniform representation of symbolic time intervals, from which Time Intervals Related Patterns (TIRPs) can be then discovered. Hence, mining TIRPs from symbolic time intervals offers a comprehensive framework for heterogeneous multivariate temporal data analysis. While the field of time intervals mining has gained a growing interest in recent decades, frequent closed TIRPs mining was not investigated in its full complexity. Mining frequent closed TIRPs is highly effective due to the discovery of a compact set of frequent TIRPs, which contains the complete information of all the frequent TIRPs. However, as we demonstrate in this paper, the recent advancements made in closed TIRPs discovery are incomplete, due to the discovery of only the first instances of the TIRPs within each STIs series in the database. In this paper we introduce the TIRPClo algorithm – for complete and efficient mining of frequent closed TIRPs. The algorithm utilizes a memory-efficient index and a novel method for data projection, due to which it is the first algorithm to guarantee a complete discovery of frequent closed TIRPs. In addition, a rigorous runtime comparison of TIRPClo to state-of-the-art methods is performed, demonstrating a significant speed-up on various real-world datasets.
This paper explores a new online learning problem where the input sequence lives in an over-time varying feature space and the ground-truth label of any input point is given only occasionally, making online learners less restrictive and more applicable. The crux in this setting lies in how to exploit the very limited labels to efficiently update the online learners. Plausible ideas such as propagating labels from labeled points to their neighbors through uncovering the point-wise geometric relations face two challenges: (1) distance measurement fails to work as different points may be described by disparate sets of features and (2) storing the geometric shape, which is formed by all arrived points, is unrealistic in an online setting. To address these challenges, we first construct a universal feature space that accumulates all observed features, making distance measurement feasible. Then, we use manifolds to represent the geometric shapes and approximate them in a sparse means, making manifolds computational and memory tractable in online learning. We frame these two building blocks into a regularized risk minimization algorithm. Theoretical analysis and empirical evidence substantiate the viability and effectiveness of our proposal.
Social recommendation task aims to predict users' preferences over items with the incorporation of social connections among users, so as to alleviate the sparse issue of collaborative filtering. While many recent efforts show the effectiveness of neural network-based social recommender systems, several important challenges have not been well addressed yet: (i) The majority of models only consider users’ social connections, while ignoring the inter-dependent knowledge across items; (ii) Most of existing solutions are designed for singular type of user-item interactions, making them infeasible to capture the interaction heterogeneity; (iii) The dynamic nature of user-item interactions has been less explored in many social-aware recommendation techniques. To tackle the above challenges, this work proposes a Knowledge-aware Coupled Graph Neural Network (KCGN) that jointly injects the inter-dependent knowledge across items and users into the recommendation framework. KCGN enables the high-order user- and item-wise relation encoding by exploiting the mutual information for global graph structure awareness. Additionally, we further augment KCGN with the capability of capturing dynamic multi-typed user-item interactive patterns. Experimental studies on real-world datasets show the effectiveness of our method against many strong baselines in a variety of settings. Source codes are available at: https://github.com/xhcdream/KCGN.
Session-based recommendation plays a central role in a wide spectrum of online applications, ranging from e-commerce to online advertising services. However, the majority of existing session-based recommendation techniques (e.g., attention-based recurrent network or graph neural network) are not well-designed for capturing the complex transition dynamics exhibited with temporally-ordered and multi-level interdependent relation structures. These methods largely overlook the relation hierarchy of item transitional patterns. In this paper, we propose a multi-task learning framework with Multi-level Transition Dynamics (MTD), which enables the jointly learning of intra- and inter-session item transition dynamics in automatic and hierarchical manner. Towards this end, we first develop a position-aware attention mechanism to learn item transitional regularities within individual session. Then, a graph-structured hierarchical relation encoder is proposed to explicitly capture the cross-session item transitions in the form of high-order connectivities by performing embedding propagation with the global graph context. The learning process of intra- and inter-session transition dynamics are integrated, to preserve the underlying low- and high-level item relationships in a common latent space. Extensive experiments on three real-world datasets demonstrate the superiority of MTD as compared to state-of-the-art baselines.
This paper addresses the task of explaining anomalous predictions of a black-box regression model. When using a black-box model, such as one to predict building energy consumption from many sensor measurements, we often have a situation where some observed samples may significantly deviate from their prediction. It may be due to a sub-optimal black-box model, or simply because those samples are outliers. In either case, one would ideally want to compute a responsibility score indicative of the extent to which an input variable is responsible for the anomalous output. In this work, we formalize this task as a statistical inverse problem: Given model deviation from the expected value, infer the responsibility score of each of the input variables. We propose a new method called likelihood compensation (LC), which is founded on the likelihood principle and computes a correction to each input variable. To the best of our knowledge, this is the first principled framework that computes a responsibility score for real valued anomalous model deviations. We apply our approach to a real-world building energy prediction task and confirm its utility based on expert feedback.
Hyperspectral anomaly detection (HAD) is a challenging task because it explores the intrinsic structure of complex high-dimensional signals without any samples at training time. Deep neural networks (DNNs) can dig out the underlying distribution of hyperspectral data but are limited by the labeling of large-scale hyperspectral datasets, especially the low spatial resolution of hyperspectral data, which makes labeling more difficult. To tackle this problem while ensuring the detection performance, we present an unsupervised low-rank embedded network (LREN) in this paper. LREN is a joint learning network in which the latent representation is specifically designed for HAD, rather than merely as a feature input for the detector. And it searches the lowest rank representation based on a representative and discriminative dictionary in the deep latent space to estimate the residual efficiently. Considering the physically mixing properties in hyperspectral imaging, we develop a trainable density estimation module based on Gaussian mixture model (GMM) in the deep latent space to construct a dictionary that can better characterize the complex hyperspectral images (HSIs). The closed-form solution of the proposed low-rank learner surpasses existing approaches on four real hyperspectral datasets with different anomalies. We argue that this unified framework paves a novel way to combine feature extraction and anomaly estimation-based methods for HAD, which intends to learn the underlying representation tailored for HAD without the prerequisite of manually labeled data. Code available at https://github.com/xdjiangkai/LREN.
Since the recent studies (KDD'20) done by Krichene and Rendle on the sampling based top-k evaluation metric for recommendation, there have been a lot of debate on the validity of using sampling for evaluating recommendation algorithms. Though their work and the recent work done by Li et. al. (KDD'20) have proposed some basic approach for mapping the sampling based metrics to their counter-part in the global evaluation which uses the entire dataset, there is still lack of understanding how sampling should be used for recommendation evaluation, and the proposed approaches either are rather ad-hoc or can only work on simple metrics, like Recall/Hit-Ratio. In this paper, we introduce some principled approach to derive the estimators of top-k metric based on sampling. Our approaches utilize the weighted MLE and maximal entropy approach to recover the global rank distribution and then utilize that for estimation. The experimental results shows significant advantages of using our approaches for evaluating recommendation algorithms based on top-k metrics.