IJCAI.2022 - Data Mining

| Total: 75

#1 Doubly Sparse Asynchronous Learning for Stochastic Composite Optimization [PDF] [Copy] [Kimi] [REL]

Authors: Runxue Bao ; Xidong Wu ; Wenhan Xian ; Heng Huang

Parallel optimization has become popular for large-scale learning in the past decades. However, existing methods suffer from huge computational costs, memory usage, and communication burden in high-dimensional scenarios. To address the challenges, we propose a new accelerated doubly sparse asynchronous learning (DSAL) method for stochastic composite optimization, under which two algorithms are proposed on shared-memory and distributed-memory architecture respectively, which only conducts gradient descent on the nonzero coordinates (data sparsity) and active set (model sparsity). The proposed algorithm can converge much faster and achieve significant speedup by simultaneously enjoying the sparsity of the model and data. Moreover, by sending the gradients on the active set only, communication costs are dramatically reduced. Theoretically, we prove that the proposed method achieves the linear convergence rate with lower overall complexity and can achieve the model identification in a finite number of iterations almost surely. Finally, extensive experimental results on benchmark datasets confirm the superiority of our proposed method.

#2 Hypergraph Structure Learning for Hypergraph Neural Networks [PDF] [Copy] [Kimi] [REL]

Authors: Derun Cai ; Moxian Song ; Chenxi Sun ; Baofeng Zhang ; Shenda Hong ; Hongyan Li

Hypergraphs are natural and expressive modeling tools to encode high-order relationships among entities. Several variations of Hypergraph Neural Networks (HGNNs) are proposed to learn the node representations and complex relationships in the hypergraphs. Most current approaches assume that the input hypergraph structure accurately depicts the relations in the hypergraphs. However, the input hypergraph structure inevitably contains noise, task-irrelevant information, or false-negative connections. Treating the input hypergraph structure as ground-truth information unavoidably leads to sub-optimal performance. In this paper, we propose a Hypergraph Structure Learning (HSL) framework, which optimizes the hypergraph structure and the HGNNs simultaneously in an end-to-end way. HSL learns an informative and concise hypergraph structure that is optimized for downstream tasks. To efficiently learn the hypergraph structure, HSL adopts a two-stage sampling process: hyperedge sampling for pruning redundant hyperedges and incident node sampling for pruning irrelevant incident nodes and discovering potential implicit connections. The consistency between the optimized structure and the original structure is maintained by the intra-hyperedge contrastive learning module. The sampling processes are jointly optimized with HGNNs towards the objective of the downstream tasks. Experiments conducted on 7 datasets show shat HSL outperforms the state-of-the-art baselines while adaptively sparsifying hypergraph structures.

#3 Entity Alignment with Reliable Path Reasoning and Relation-aware Heterogeneous Graph Transformer [PDF] [Copy] [Kimi] [REL]

Authors: Weishan Cai ; Wenjun Ma ; Jieyu Zhan ; Yuncheng Jiang

Entity Alignment (EA) has attracted widespread attention in both academia and industry, which aims to seek entities with same meanings from different Knowledge Graphs (KGs). There are substantial multi-step relation paths between entities in KGs, indicating the semantic relations of entities. However, existing methods rarely consider path information because not all natural paths facilitate for EA judgment. In this paper, we propose a more effective entity alignment framework, RPR-RHGT, which integrates relation and path structure information, as well as the heterogeneous information in KGs. Impressively, an initial reliable path reasoning algorithm is developed to generate the paths favorable for EA task from the relation structures of KGs. This is the first algorithm in the literature to successfully use unrestricted path information. In addition, to efficiently capture heterogeneous features in entity neighborhoods, a relation-aware heterogeneous graph transformer is designed to model the relation and path structures of KGs. Extensive experiments on three well-known datasets show RPR-RHGT significantly outperforms 10 state-of-the-art methods, exceeding the best performing baseline up to 8.62% on Hits@1. We also show its better performance than the baselines on different ratios of training set, and harder datasets.

#4 Non-Euclidean Self-Organizing Maps [PDF] [Copy] [Kimi] [REL]

Authors: Dorota Celińska-Kopczyńska ; Eryk Kopczyński

Self-Organizing Maps (SOMs, Kohonen networks) belong to neural network models of the unsupervised class. In this paper, we present the generalized setup for non-Euclidean SOMs. Most data analysts take it for granted to use some subregions of a flat space as their data model; however, by the assumption that the underlying geometry is non-Euclidean we obtain a new degree of freedom for the techniques that translate the similarities into spatial neighborhood relationships. We improve the traditional SOM algorithm by introducing topology-related extensions. Our proposition can be successfully applied to dimension reduction, clustering or finding similarities in big data (both hierarchical and non-hierarchical).

#5 Can Abnormality be Detected by Graph Neural Networks? [PDF] [Copy] [Kimi] [REL]

Authors: Ziwei Chai ; Siqi You ; Yang Yang ; Shiliang Pu ; Jiarong Xu ; Haoyang Cai ; Weihao Jiang

Anomaly detection in graphs has attracted considerable interests in both academia and industry due to its wide applications in numerous domains ranging from finance to biology. Meanwhile, graph neural networks (GNNs) is emerging as a powerful tool for modeling graph data. A natural and fundamental question that arises here is: can abnormality be detected by graph neural networks? In this paper, we aim to answer this question, which is nontrivial. As many existing works have explored, graph neural networks can be seen as filters for graph signals, with the favor of low frequency in graphs. In other words, GNN will smooth the signals of adjacent nodes. However, abnormality in a graph intuitively has the characteristic that it tends to be dissimilar to its neighbors, which are mostly normal samples. It thereby conflicts with the general assumption with traditional GNNs. To solve this, we propose a novel Adaptive Multi-frequency Graph Neural Network (AMNet), aiming to capture both low-frequency and high-frequency signals, and adaptively combine signals of different frequencies. Experimental results on real-world datasets demonstrate that our model achieves a significant improvement comparing with several state-of-the-art baseline methods.

#6 Robust High-Dimensional Classification From Few Positive Examples [PDF] [Copy] [Kimi] [REL]

Authors: Deepayan Chakrabarti ; Benjamin Fauber

We tackle an extreme form of imbalanced classification, with up to 105 features but as few as 5 samples from the minority class. This problem occurs in predicting predicting tumor types and fraud detection, among others. Standard imbalanced classification methods are not designed for such severe data scarcity. Sampling-based methods need too many samples due to the high-dimensionality, while cost-based methods must place too high a weight on the limited minority samples. Our proposed method, called DIRECT, bypasses sample generation by training the classifier over a robust smoothed distribution of the minority class. DIRECT is fast, simple, robust, parameter-free, and easy to interpret. We validate DIRECT on several real-world datasets spanning document, image, and medical classification. DIRECT is up to 5x − 7x better than SMOTE-like methods, 30−200% better than ensemble methods, 3x − 7x better than cost-sensitive methods. The greatest gains are for settings with the fewest samples in the minority class, where DIRECT’s robustness is most helpful.

#7 Vertically Federated Graph Neural Network for Privacy-Preserving Node Classification [PDF] [Copy] [Kimi] [REL]

Authors: Chaochao Chen ; Jun Zhou ; Longfei Zheng ; Huiwen Wu ; Lingjuan Lyu ; Jia Wu ; Bingzhe Wu ; Ziqi Liu ; Li Wang ; Xiaolin Zheng

Recently, Graph Neural Network (GNN) has achieved remarkable progresses in various real-world tasks on graph data, consisting of node features and the adjacent information between different nodes. High-performance GNN models always depend on both rich features and complete edge information in graph. However, such information could possibly be isolated by different data holders in practice, which is the so-called data isolation problem. To solve this problem, in this paper, we propose VFGNN, a federated GNN learning paradigm for privacy-preserving node classification task under data vertically partitioned setting, which can be generalized to existing GNN models. Specifically, we split the computation graph into two parts. We leave the private data (i.e., features, edges, and labels) related computations on data holders, and delegate the rest of computations to a semi-honest server. We also propose to apply differential privacy to prevent potential information leakage from the server. We conduct experiments on three benchmarks and the results demonstrate the effectiveness of VFGNN.

#8 Meta-Learning Based Knowledge Extrapolation for Knowledge Graphs in the Federated Setting [PDF] [Copy] [Kimi] [REL]

Authors: Mingyang Chen ; Wen Zhang ; Zhen Yao ; Xiangnan Chen ; Mengxiao Ding ; Fei Huang ; Huajun Chen

We study the knowledge extrapolation problem to embed new components (i.e., entities and relations) that come with emerging knowledge graphs (KGs) in the federated setting. In this problem, a model trained on an existing KG needs to embed an emerging KG with unseen entities and relations. To solve this problem, we introduce the meta-learning setting, where a set of tasks are sampled on the existing KG to mimic the link prediction task on the emerging KG. Based on sampled tasks, we meta-train a graph neural network framework that can construct features for unseen components based on structural information and output embeddings for them. Experimental results show that our proposed method can effectively embed unseen components and outperforms models that consider inductive settings for KGs and baselines that directly use conventional KG embedding methods.

#9 Mutual Distillation Learning Network for Trajectory-User Linking [PDF] [Copy] [Kimi] [REL]

Authors: Wei Chen ; ShuZhe Li ; Chao Huang ; Yanwei Yu ; Yongguo Jiang ; Junyu Dong

Trajectory-User Linking (TUL), which links trajectories to users who generate them, has been a challenging problem due to the sparsity in check-in mobility data. Existing methods ignore the utilization of historical data or rich contextual features in check-in data, resulting in poor performance for TUL task. In this paper, we propose a novel Mutual distillation learning network to solve the TUL problem for sparse check-in mobility data, named MainTUL. Specifically, MainTUL is composed of a Recurrent Neural Network (RNN) trajectory encoder that models sequential patterns of input trajectory and a temporal-aware Transformer trajectory encoder that captures long-term time dependencies for the corresponding augmented historical trajectories. Then, the knowledge learned on historical trajectories is transferred between the two trajectory encoders to guide the learning of both encoders to achieve mutual distillation of information. Experimental results on two real-world check-in mobility datasets demonstrate the superiority of \model against state-of-the-art baselines. The source code of our model is available at https://github.com/Onedean/MainTUL.

#10 Towards Robust Dense Retrieval via Local Ranking Alignment [PDF] [Copy] [Kimi] [REL]

Authors: Xuanang Chen ; Jian Luo ; Ben He ; Le Sun ; Yingfei Sun

Dense retrieval (DR) has extended the employment of pre-trained language models, like BERT, for text ranking. However, recent studies have raised the robustness issue of DR model against query variations, like query with typos, along with non-trivial performance losses. Herein, we argue that it would be beneficial to allow the DR model to learn to align the relative positions of query-passage pairs in the representation space, as query variations cause the query vector to drift away from its original position, affecting the subsequent DR effectiveness. To this end, we propose RoDR, a novel robust DR model that learns to calibrate the in-batch local ranking of query variation to that of original query for the DR space alignment. Extensive experiments on MS MARCO and ANTIQUE datasets show that RoDR significantly improves the retrieval results on both the original queries and different types of query variations. Meanwhile, RoDR provides a general query noise-tolerate learning framework that boosts the robustness and effectiveness of various existing DR models. Our code and models are openly available at https://github.com/cxa-unique/RoDR.

#11 Filtration-Enhanced Graph Transformation [PDF] [Copy] [Kimi] [REL]

Authors: Zijian Chen ; Rong-Hua Li ; Hongchao Qin ; Huanzhong Duan ; Yanxiong Lu ; Qiangqiang Dai ; Guoren Wang

Graph kernels and graph neural networks (GNNs) are widely used for the classification of graph data. However, many existing graph kernels and GNNs have limited expressive power, because they cannot distinguish graphs if the classic 1-dimensional Weisfeiler-Leman (1-WL) algorithm does not distinguish them. To break the 1-WL expressiveness barrier, we propose a novel method called filtration-enhanced graph transformation, which is based on a concept from the area of topological data analysis. In a nutshell, our approach first transforms each original graph into a filtration-enhanced graph based on a certain pre-defined filtration operation, and then uses the transformed graphs as the inputs for graph kernels or GNNs. The striking feature of our approach is that it is a plug-in method and can be applied in any graph kernel and GNN to enhance their expressive power. We theoretically and experimentally demonstrate that our solutions exhibit significantly better performance than the state-of-the art solutions for graph classification tasks.

#12 Triformer: Triangular, Variable-Specific Attentions for Long Sequence Multivariate Time Series Forecasting [PDF] [Copy] [Kimi] [REL]

Authors: Razvan-Gabriel Cirstea ; Chenjuan Guo ; Bin Yang ; Tung Kieu ; Xuanyi Dong ; Shirui Pan

A variety of real-world applications rely on far future information to make decisions, thus calling for efficient and accurate long sequence multivariate time series forecasting. While recent attention-based forecasting models show strong abilities in capturing long-term dependencies, they still suffer from two key limitations. First, canonical self attention has a quadratic complexity w.r.t. the input time series length, thus falling short in efficiency. Second, different variables’ time series often have distinct temporal dynamics, which existing studies fail to capture, as they use the same model parameter space, e.g., projection matrices, for all variables’ time series, thus falling short in accuracy. To ensure high efficiency and accuracy, we propose Triformer, a triangular, variable-specific attention. (i) Linear complexity: we introduce a novel patch attention with linear complexity. When stacking multiple layers of the patch attentions, a triangular structure is proposed such that the layer sizes shrink exponentially, thus maintaining linear complexity. (ii) Variable-specific parameters: we propose a light-weight method to enable distinct sets of model parameters for different variables’ time series to enhance accuracy without compromising efficiency and memory usage. Strong empirical evidence on four datasets from multiple domains justifies our design choices, and it demonstrates that Triformer outperforms state-of-the-art methods w.r.t. both accuracy and efficiency. Source code is publicly available at https://github.com/razvanc92/triformer.

#13 CADET: Calibrated Anomaly Detection for Mitigating Hardness Bias [PDF] [Copy] [Kimi] [REL]

Authors: Ailin Deng ; Adam Goodge ; Lang Yi Ang ; Bryan Hooi

The detection of anomalous samples in large, high-dimensional datasets is a challenging task with numerous practical applications. Recently, state-of-the-art performance is achieved with deep learning methods: for example, using the reconstruction error from an autoencoder as anomaly scores. However, the scores are uncalibrated: that is, they follow an unknown distribution and lack a clear interpretation. Furthermore, the reconstruction error is highly influenced by the `hardness' of a given sample, which leads to false negative and false positive errors. In this paper, we empirically show the significance of this hardness bias present in a range of recent deep anomaly detection methods. To mitigate this, we propose an efficient and plug-and-play error calibration method which mitigates this hardness bias in the anomaly scoring without the need to retrain the model. We verify the effectiveness of our method on a range of image, time-series, and tabular datasets and against several baseline methods.

#14 Private Semi-Supervised Federated Learning [PDF] [Copy] [Kimi] [REL]

Authors: Chenyou Fan ; Junjie Hu ; Jianwei Huang

We study a federated learning (FL) framework to effectively train models from scarce and skewly distributed labeled data. We consider a challenging yet practical scenario: a few data sources own a small amount of labeled data, while the rest mass sources own purely unlabeled data. Classical FL requires each client to have enough labeled data for local training, thus is not applicable in this scenario. In this work, we design an effective federated semi-supervised learning framework (FedSSL) to fully leverage both labeled and unlabeled data sources. We establish a unified data space across all participating agents, so that each agent can generate mixed data samples to boost semi-supervised learning (SSL), while keeping data locality. We further show that FedSSL can integrate differential privacy protection techniques to prevent labeled data leakage at the cost of minimum performance degradation. On SSL tasks with as small as 0.17% and 1% of MNIST and CIFAR-10 datasets as labeled data, respectively, our approach can achieve 5-20% performance boost over the state-of-the-art methods.

#15 Feature and Instance Joint Selection: A Reinforcement Learning Perspective [PDF] [Copy] [Kimi] [REL]

Authors: Wei Fan ; Kunpeng Liu ; Hao Liu ; Hengshu Zhu ; Hui Xiong ; Yanjie Fu

Feature selection and instance selection are two important techniques of data processing. However, such selections have mostly been studied separately, while existing work towards the joint selection conducts feature/instance selection coarsely; thus neglecting the latent fine-grained interaction between feature space and instance space. To address this challenge, we propose a reinforcement learning solution to accomplish the joint selection task and simultaneously capture the interaction between the selection of each feature and each instance. In particular, a sequential-scanning mechanism is designed as action strategy of agents and a collaborative-changing environment is used to enhance agent collaboration. In addition, an interactive paradigm introduces prior selection knowledge to help agents for more efficient exploration. Finally, extensive experiments on real-world datasets have demonstrated improved performances.

#16 MetaER-TTE: An Adaptive Meta-learning Model for En Route Travel Time Estimation [PDF] [Copy] [Kimi] [REL]

Authors: Yu Fan ; Jiajie Xu ; Rui Zhou ; Jianxin Li ; Kai Zheng ; Lu Chen ; Chengfei Liu

En route travel time estimation (ER-TTE) aims to predict the travel time on the remaining route. Since the traveled and remaining parts of a trip usually have some common characteristics like driving speed, it is desirable to explore these characteristics for improved performance via effective adaptation. This yet faces the severe problem of data sparsity due to the few sampled points in a traveled partial trajectory. Since trajectories with different contextual information tend to have different characteristics, the existing meta-learning method for ER-TTE cannot fit each trajectory well because it uses the same model for all trajectories. To this end, we propose a novel adaptive meta-learning model called MetaER-TTE. Particularly, we utilize soft-clustering and derive cluster-aware initialized parameters to better transfer the shared knowledge across trajectories with similar contextual information. In addition, we adopt a distribution-aware approach for adaptive learning rate optimization, so as to avoid task-overfitting which will occur when guiding the initial parameters with a fixed learning rate for tasks under imbalanced distribution. Finally, we conduct comprehensive experiments to demonstrate the superiority of MetaER-TTE.

#17 When Transfer Learning Meets Cross-City Urban Flow Prediction: Spatio-Temporal Adaptation Matters [PDF] [Copy] [Kimi] [REL]

Authors: Ziquan Fang ; Dongen Wu ; Lu Pan ; Lu Chen ; Yunjun Gao

Urban flow prediction is a fundamental task to build smart cities, where neural networks have become the most popular method. However, the deep learning methods typically rely on massive training data that are probably inaccessible in real world. In light of this, the community calls for knowledge transfer. However, when adapting transfer learning for cross-city prediction tasks, existing studies are built on static knowledge transfer, ignoring the fact inter-city correlations change dynamically across time. The dynamic correlations make urban feature transfer challenging. This paper proposes a novel Spatio-Temporal Adaptation Network (STAN) to perform urban flow prediction for data-scarce cities via the spatio-temporal knowledge transferred from data-rich cities. STAN encompasses three modules: i) spatial adversarial adaptation module that adopts an adversarial manner to capture the transferable spatial features; ii) temporal attentive adaptation module to attend to critical dynamics for temporal feature transfer; iii) prediction module that aims to learn task-driven transferable knowledge. Extensive experiments on five real datasets show STAN substantially outperforms state-of-the-art methods.

#18 Disentangling the Computational Complexity of Network Untangling [PDF] [Copy] [Kimi] [REL]

Authors: Vincent Froese ; Pascal Kunz ; Philipp Zschoche

We study the recently introduced network untangling problem, a variant of Vertex Cover on temporal graphs---graphs whose edge set changes over discrete time steps. There are two versions of this problem. The goal is to select at most k time intervals for each vertex such that all time-edges are covered and (depending on the problem variant) either the maximum interval length or the total sum of interval lengths is minimized. This problem has data mining applications in finding activity timelines that explain the interactions of entities in complex networks. Both variants of the problem are NP-hard. In this paper, we initiate a multivariate complexity analysis involving the following parameters: number of vertices, lifetime of the temporal graph, number of intervals per vertex, and the interval length bound. For both problem versions, we (almost) completely settle the parameterized complexity for all combinations of those four parameters, thereby delineating the border of fixed-parameter tractability.

#19 Modeling Precursors for Temporal Knowledge Graph Reasoning via Auto-encoder Structure [PDF] [Copy] [Kimi] [REL]

Authors: Yifu Gao ; Linhui Feng ; Zhigang Kan ; Yi Han ; Linbo Qiao ; Dongsheng Li

Temporal knowledge graph (TKG) reasoning that infers missing facts in the future is an essential and challenging task. When predicting a future event, there must be a narrative evolutionary process composed of closely related historical facts to support the event's occurrence, namely fact precursors. However, most existing models employ a sequential reasoning process in an auto-regressive manner, which cannot capture precursor information. This paper proposes a novel auto-encoder architecture that introduces a relation-aware graph attention layer into transformer (rGalT) to accommodate inference over the TKG. Specifically, we first calculate the correlation between historical and predicted facts through multiple attention mechanisms along intra-graph and inter-graph dimensions, then constitute these mutually related facts into diverse fact segments. Next, we borrow the translation generation idea to decode in parallel the precursor information associated with the given query, which enables our model to infer future unknown facts by progressively generating graph structures. Experimental results on four benchmark datasets demonstrate that our model outperforms other state-of-the-art methods, and precursor identification provides supporting evidence for prediction.

#20 Self-supervised Graph Neural Networks for Multi-behavior Recommendation [PDF] [Copy] [Kimi1] [REL]

Authors: Shuyun Gu ; Xiao Wang ; Chuan Shi ; Ding Xiao

Traditional recommendation usually focuses on utilizing only one target user behavior (e.g., purchase) but ignoring other auxiliary behaviors (e.g., click, add to cart). Early efforts of multi-behavior recommendation often emphasize the differences between multiple behaviors, i.e., they aim to extract useful information by distinguishing different behaviors. However, the commonality between them, which reflects user's common preference for items associated with different behaviors, is largely ignored. Meanwhile, the multi-behavior recommendation still severely suffers from limited supervision signal issue. In this paper, we propose a novel self-supervised graph collaborative filtering model for multi-behavior recommendation named S-MBRec. Specifically, for each behavior, we execute the GCNs to learn the user and item embeddings. Then we design a supervised task, distinguishing the importance of different behaviors, to capture the differences between embeddings. Meanwhile, we propose a star-style contrastive learning task to capture the embedding commonality between target and auxiliary behaviors, so as to alleviate the sparsity of supervision signal, reduce the redundancy among auxiliary behavior, and extract the most critical information. Finally, we jointly optimize the above two tasks. Extensive experiments, in comparison with state-of-the-arts, well demonstrate the effectiveness of S-MBRec, where the maximum improvement can reach to 20%.

#21 Constrained Adaptive Projection with Pretrained Features for Anomaly Detection [PDF] [Copy] [Kimi] [REL]

Authors: Xingtai Gui ; Di Wu ; Yang Chang ; Shicai Fan

Anomaly detection aims to separate anomalies from normal samples, and the pretrained network is promising for anomaly detection. However, adapting the pretrained features would be confronted with the risk of pattern collapse when finetuning on one-class training data. In this paper, we propose an anomaly detection framework called constrained adaptive projection with pretrained features (CAP). Combined with pretrained features, a simple linear projection head applied on a specific input and its k most similar pretrained normal representations is designed for feature adaptation, and a reformed self-attention is leveraged to mine the inner-relationship among one-class semantic features. A loss function is proposed to avoid potential pattern collapse. Concretely, it considers the similarity between a specific data and its corresponding adaptive normal representation, and incorporates a constraint term slightly aligning pretrained and adaptive spaces. Our method achieves state-of-the-art anomaly detection performance on semantic anomaly detection and sensory anomaly detection benchmarks including 96.5% AUROC on CIFAR-100 dataset, 97.0% AUROC on CIFAR-10 dataset and 89.9% AUROC on MvTec dataset.

#22 Quaternion Ordinal Embedding [PDF] [Copy] [Kimi] [REL]

Authors: Wenzheng Hou ; Qianqian Xu ; Ke Ma ; Qianxiu Hao ; Qingming Huang

Ordinal embedding (OE) aims to project objects into a low-dimensional space while preserving their ordinal constraints as well as possible. Generally speaking, a reasonable OE algorithm should simultaneously capture a) semantic meaning and b) the ordinal relationship of the objects. However, most of the existing methods merely focus on b). To address this issue, our goal in this paper is to seek a generic OE method to embrace the two features simultaneously. We argue that different dimensions of vector-based embedding are naturally entangled with each other. To realize a), we expect to decompose the D dimensional embedding space into D different semantic subspaces, where each subspace is associated with a matrix representation. Unfortunately, introducing a matrix-based representation requires far more complex parametric space than its vector-based counterparts. Thanks to the algebraic property of quaternions, we are able to find a more efficient way to represent a matrix with quaternions. For b), inspired by the classic chordal Grassmannian distance, a new distance function is defined to measure the distance between different quaternions/matrices, on top of which we construct a generic OE loss function. Experimental results for different tasks on both simulated and real-world datasets verify the effectiveness of our proposed method.

#23 MERIT: Learning Multi-level Representations on Temporal Graphs [PDF] [Copy] [Kimi] [REL]

Authors: Binbin Hu ; Zhengwei Wu ; Jun Zhou ; Ziqi Liu ; Zhigang Huangfu ; Zhiqiang Zhang ; Chaochao Chen

Recently, representation learning on temporal graphs has drawn increasing attention, which aims at learning temporal patterns to characterize the evolving nature of dynamic graphs in real-world applications. Despite effectiveness, these methods commonly ignore the individual- and combinatorial-level patterns derived from different types of interactions (e.g.,user-item), which are at the heart of the representation learning on temporal graphs. To fill this gap, we propose MERIT, a novel multi-level graph attention network for inductive representation learning on temporal graphs.We adaptively embed the original timestamps to a higher, continuous dimensional space for learn-ing individual-level periodicity through Personalized Time Encoding (PTE) module. Furthermore, we equip MERIT with Continuous time and Con-text aware Attention (Coco-Attention) mechanism which chronologically locates most relevant neighbors by jointly capturing multi-level context on temporal graphs. Finally, MERIT performs multiple aggregations and propagations to explore and exploit high-order structural information for down-stream tasks. Extensive experiments on four public datasets demonstrate the effectiveness of MERITon both (inductive / transductive) link prediction and node classification task.

#24 GraphDIVE: Graph Classification by Mixture of Diverse Experts [PDF] [Copy] [Kimi] [REL]

Authors: Fenyu Hu ; Liping Wang ; Qiang Liu ; Shu Wu ; Liang Wang ; Tieniu Tan

Graph classification is a challenging research task in many applications across a broad range of domains. Recently, Graph Neural Network (GNN) models have achieved superior performance on various real-world graph datasets. Despite their successes, most of current GNN models largely suffer from the ubiquitous class imbalance problem, which typically results in prediction bias towards majority classes. Although many imbalanced learning methods have been proposed, they mainly focus on regular Euclidean data and cannot well utilize topological structure of graph (non-Euclidean) data. To boost the performance of GNNs and investigate the relationship between topological structure and class imbalance, we propose GraphDIVE, which learns multi-view graph representations and combine multi-view experts (i.e., classifiers). Specifically, multi-view graph representations correspond to the intrinsic diverse graph topological structure characteristics. Extensive experiments on molecular benchmark datasets demonstrate the effectiveness of the proposed approach.

#25 End-to-End Open-Set Semi-Supervised Node Classification with Out-of-Distribution Detection [PDF] [Copy] [Kimi] [REL]

Authors: Tiancheng Huang ; Donglin Wang ; Yuan Fang ; Zhengyu Chen

Out-Of-Distribution (OOD) samples are prevalent in real-world applications. The OOD issue becomes even more severe on graph data, as the effect of OOD nodes can be potentially amplified by propagation through the graph topology. Recent works have considered the OOD detection problem, which is critical for reducing the uncertainty in learning and improving the robustness. However, no prior work considers simultaneously OOD detection and node classification on graphs in an end-to-end manner. In this paper, we study a novel problem of end-to-end open-set semi-supervised node classification (OSSNC) on graphs, which deals with node classification in the presence of OOD nodes. Given the lack of supervision on OOD nodes, we introduce a latent variable to indicate in-distribution or OOD nodes in a variational inference framework, and further propose a novel algorithm named as Learning to Mix Neighbors (LMN) which learns to dampen the influence of OOD nodes through the messaging-passing in typical graph neural networks. Extensive experiments on various datasets show that the proposed method outperforms state-of-the-art baselines in terms of both node classification and OOD detection.