| Total: 233
Networked time series are time series on a graph, one for each node, with applications in traffic and weather monitoring. Graph neural networks are natural candidates for networked time series imputation and have recently outperformed existing alternatives such as recurrent and generative models for time series imputation as they utilize a relational inductive bias for imputation. However, existing GNN-based approaches fail to capture the higher-order topological structure between sensors, which are shaped by recurring substructures in the graph, referred to as temporal motifs. In addition, it remains uncertain which motifs are the most pivotal motifs guiding the imputation task in networked time series. In this paper, we fill in this gap by proposing a graph neural network designed to leverage motif structures within the network by employing weighted motif adjacency matrices to capture higher-order neighborhood information. In particular, (1) we design a motif-wise multi-view attention module that explicitly captures various higher-order structures along with an attention mechanism that automatically assigns high weights to informative ones in order to maximize the use of higher-order information. (2) We introduce a gated fusion module by merging gated recurrent networks and graph convolutional networks to capture the spatial and temporal dependency in order to reflect the intricate impacts of temporal and spatial influence. Experimental results demonstrate that when compared to state-of-the-art models for time-series imputation tasks, our proposed model can reduce the error by around 19%.
Efficiently capturing consistent and complementary semantic features in context is crucial for Multimodal Emotion Recognition in Conversations (MERC). However, limited by the over-smoothing or low-pass filtering characteristics of spatial graph neural networks, are insufficient to accurately capture the long-distance consistency low-frequency information and complementarity high-frequency information of the utterances. To this end, this paper revisits the task of MERC from the perspective of the graph spectrum and proposes a Graph-Spectrum-based Multimodal Consistency and Complementary collaborative learning framework GS-MCC. First, GS-MCC uses a sliding window to construct a multimodal interaction graph to model conversational relationships and designs efficient Fourier graph operators (FGO) to extract long-distance high-frequency and low-frequency information, respectively. FGO can be stacked in multiple layers, which can effectively alleviate the over-smoothing problem. Then, GS-MCC uses contrastive learning to construct self-supervised signals that reflect complementarity and consistent semantic collaboration with high and low-frequency signals, thereby improving the ability of high and low-frequency information to reflect genuine emotions. Finally, GS-MCC inputs the coordinated high and low-frequency information into the MLP network and softmax function for emotion prediction. Extensive experiments have proven the superiority of the GS-MCC architecture proposed in this paper on two benchmark data sets.
The problem of forecasting spatiotemporal events such as crimes and accidents is crucial to public safety and city management. Besides accuracy, interpretability is also a key requirement for spatiotemporal forecasting models to justify the decisions. Merely presenting predicted scores fails to convince the public and does not contribute to future urban planning. Interpretation of the spatiotemporal forecasting mechanism is, however, challenging due to the complexity of multi-source spatiotemporal features, the non-intuitive nature of spatiotemporal patterns for non-expert users, and the presence of spatial heterogeneity in the data. Currently, no existing deep learning model intrinsically interprets the complex predictive process learned from multi-source spatiotemporal features. To bridge the gap, we propose GeoPro-Net, an intrinsically interpretable spatiotemporal model for spatiotemporal event forecasting problems. GeoPro-Net introduces a novel Geo-concept convolution operation, which employs statistical tests to extract predictive patterns in the input as "Geo-concepts'', and condenses the "Geo-concept-encoded'' input through interpretable channel fusion and geographic-based pooling. In addition, GeoPro-Net learns different sets of prototypes of concepts inherently, and projects them to real-world cases for interpretation. Comprehensive experiments and case studies on four real-world datasets demonstrate that GeoPro-Net provides better interpretability while still achieving competitive prediction performance compared with state-of-the-art baselines.
Spatiotemporal Graph Learning (SGL) under Zero-Inflated Distribution (ZID) is crucial for urban risk management tasks, including crime prediction and traffic accident profiling. However, SGL models are vulnerable to adversarial attacks, compromising their practical utility. While adversarial training (AT) has been widely used to bolster model robustness, our study finds that traditional AT exacerbates performance disparities between majority and minority classes under ZID, potentially leading to irreparable losses due to underreporting critical risk events. In this paper, we first demonstrate the smaller top-k gradients and lower separability of minority class are key factors contributing to this disparity. To address these issues, we propose MinGRE, a framework for Minority Class Gradients and Representations Enhancement. MinGRE employs a multi-dimensional attention mechanism to reweight spatiotemporal gradients, minimizing the gradient distribution discrepancies across classes. Additionally, we introduce an uncertainty-guided contrastive loss to improve the inter-class separability and intra-class compactness of minority representations with higher uncertainty. Extensive experiments demonstrate that the MinGRE framework not only significantly reduces the performance disparity across classes but also achieves enhanced robustness compared to existing baselines. These findings underscore the potential of our method in fostering the development of more equitable and robust models.
Recommendation systems (RS) play a crucial role in assisting decision-making but often suffer from either a lack of credibility or unfairness problems. A few recommendation models have endeavored to address the problem from only one aspect, and approaches to solving both problems remain to be explored. This paper aims to construct a generalized fairness-based recommendation framework that can also provide the credibility of recommendation models. Generally, we propose a reliable and fair recommendation framework called Conformalized User Group Fairness (CUGF) based on the inspiration of conformal prediction. Specifically, we construct dynamic prediction sets that are guaranteed to cover the true item with a user pre-specified probability to ensure credibility while designing novel fairness metrics based on empirical risks to guarantee the fairness of users across different groups. Furthermore, we design a novel CUGF Algorithm to optimize the parameter γ that dominates the prediction sets and also the fairness. Besides, we conduct extensive experiments by applying CUGF on top of various recommendation models and representative datasets to validate its effectiveness with respect to recommendation performance (in terms of average set size) and fairness (in terms of the two defined fairness metrics), the results of which demonstrate the validity of the proposed framework.
Multimedia event extraction aims to jointly extract event structural knowledge from multiple modalities, thus improving the comprehension and utilization of events in the growing multimedia content (e.g., multimedia news). A key challenge in multimedia event extraction is to establish cross-modal correlations during training without multimedia event annotations. Considering the complexity and cost of annotation across modalities, the multimedia event extraction task only provides parallel annotated data for evaluation. Previous works attempt to learn implicit correlations directly from unlabeled image-text pairs, but do not yield substantially better performance for event-centric tasks. To address this problem, we propose a cross-modal multi-task learning framework X-MTL to establish cross-modal correlations at the task level, which can simultaneously address four key tasks of multimedia event extraction: trigger detection, argument extraction, verb classification, and role classification. Specifically, to process inputs from different modalities and tasks, we utilize two separate modality-specific encoders and a modality-shared encoder to learn joint task representations, and introduce textual and visual prompt learning methods to enrich and unify task inputs. To resolve task conflict in cross-modal multi-task learning, we propose a pseudo label based knowledge distillation method, combined with dynamic weight adjustment method, which can effectively lift the performance to surpass the separately-trained models. On the Multimedia Event Extraction benchmark M2E2, experimental results show that X-MTL surpasses the current state-of-the-art (SOTA) methods by 4.1% for multimedia event mention and 8.2% for multimedia argument role.
Traffic prediction is critical for optimizing travel scheduling and enhancing public safety, yet the complex spatial and temporal dynamics within traffic data present significant challenges for accurate forecasting. In this paper, we introduce a novel model, the Spatiotemporal-aware Trend-Seasonality Decomposition Network (STDN). This model begins by constructing a dynamic graph structure to represent traffic flow and incorporates novel spatio-temporal embeddings to jointly capture global traffic dynamics. The representations learned are further refined by a specially designed trend-seasonality decomposition module, which disentangles the trend-cyclical component and seasonal component for each traffic node at different times within the graph. These components are subsequently processed through an encoder-decoder network to generate the final predictions. Extensive experiments conducted on real-world traffic datasets demonstrate that STDN achieves superior performance with remarkable computation cost. Furthermore, we have released a new traffic dataset named JiNan, which features unique inner-city dynamics, thereby enriching the scenario comprehensiveness in traffic prediction evaluation.
Next POI recommendation aids users in predicting their destinations of interest and plays an increasingly vital role in location-based social services. Recent works focus on analyzing both long-term and short-term interests in POI recommendation to gain a deeper understanding of user profiles. However, these methods for modeling long-term user’s sequences primarily rely on the Transformer model, which functions as a low-pass filter, often leading to the loss of high-frequency information. Additionally, long-term and short-term sequences are typically modeled independently, with short-term sequences often defined solely by the most recent check-ins, overlooking their interactions and dependencies. Therefore, we propose Enhancing Long-and Short-Term Representations for Next POI Recommendations via Frequency and Hierarchical Contrastive Learning (FHCRec). FHCRec captures both high-frequency and low-frequency information in long-term sequences to model richer long-term user’s preference representations. Moreover, it harnesses the characteristics of the short-term subsequences embedded within long-term sequences to enhance short-term preference characterization via local and global hierarchical contrastive learning, resulting in more personalized short-term preferences. The enhanced long-term and short-term preferences are integrated to improve model recommendation performance. Extensive experiments on three real-world datasets demonstrate the effectiveness of our method.
Among various temporal knowledge graph (TKG) extrapolation methods, rule-based approaches stand out for their explicit rules and transparent reasoning paths. However, the vast search space for rule extraction poses a challenge in identifying high-quality logic rules. To navigate this challenge, we explore the use of generation models to generate new rules, thereby enriching our rule base and enhancing our reasoning capabilities. In this paper, we introduce LLM-DR, an innovative rule-based method for TKG extrapolation, which harnesses diffusion models to generate rules that are consistent with the distribution of the source data, while also amalgamating the rich semantic insights of Large Language Models (LLMs). Specifically, our LLM-DR generates semantically relevant and high-quality rules, employing conditional diffusion models in a classifier-free guidance fashion and refining them with LLM-based constraints. To assess rule efficacy, we meticulously design a coarse-to-fine evaluation strategy that initiates with coarse-grained filtering to eliminate less plausible rules and proceeds with fine-grained scoring to quantify the reliability of the retained. Extensive experiments demonstrate the promising capacity of our LLM-DR.
Stock prediction stands as a pivotal research objective within the Fintech. Existing deep learning research revolves around the development and scaling of one individual neural network predictor. However, in the dynamic and noisy landscape of the stock market, reliance solely on a single predictor poses risks of limited adaptability to diverse market conditions and challenges in effectively integrating multi-source information. Besides, top-down teaching and bottom-up hierarchical decision-making paradigms are critical for robust and accurate stock prediction within successful quantitative firms. Nonetheless, there is scarcely any research that integrates this workflow into stock prediction. To this end, we propose Diffusion Generated Hierarchical Mixture-of-Experts (DHMoE) to emulate such workflow in stock prediction. Specifically, DHMoE is crafted as a three-layer tree structure, where each expert functions as a node within the tree and their parameters are generated in a top-down, recursive manner. Recognizing the leading role of the top-level root expert, we harness the robust capabilities of diffusion models for generating and introduce the Diffusion Inverted Transformer (DIT) as the root expert. The DIT is tailored to receive information from various modalities as conditional inputs and allocate parameters to bottom-level experts. These bottom-level experts are responsible for performing predictions specific to their respective input modalities. The prediction results are then synthesized in a bottom-up manner, culminating in the final prediction outcomes. Experiments on three stock trading datasets reveal that DHMoE outperforms state-of-the-art methods in terms of both cumulative and risk-adjusted returns.
Spatial-temporal graph modeling is challenging due to the diverse node interactions across spatial and temporal dimensions. Recent studies typically adopt Graph Neural Networks (GNNs) to perform node-level aggregation at different time steps, acting as a series of low-pass graph spectral filters, for node interaction modeling. However, these filters, confined to the spatial dimension, are ill-suited for processing signals of nodes with inherent spatial-temporal interdependencies. Moreover, oversimplified low-pass filtering fails to fully exploit information from diverse node interactions. To address these issues, we propose a Spatial-Temporal Spectral Graph Neural Network (STSGNN), which designs specialized two-dimensional (2-D) graph spectral filters for comprehensive spatial-temporal graph modeling. First, based on the normalized Laplacian spectrum of spatial and temporal graphs, we extend the existing graph spectral theory from a univariate spatial dimension to a bivariate spatial-temporal dimension through a 2-D Discrete Graph Fourier Transform (2-D DGFT). Then, we leverage the bivariate Bernstein polynomial approximation, with learned basis coefficients, to design 2-D filters with specialized spectral properties for unified spatial-temporal signal filtering. Finally, the filtered signals, with refined spatial-temporal representations, are fed into well-designed pyramidal gated convolution modules to acquire multiple ranges of spatial-temporal dependencies. Experiments on traffic and meteorological prediction tasks demonstrate that STSGNN achieves state-of-the-art performance. Additionally, we visualize the 2-D filters learned from inputs with distinct spatial-temporal characteristics to enhance the model's interpretability.
POI representation learning plays a crucial role in handling tasks related to user mobility data. Recent studies have shown that enriching POI representations with multimodal information can significantly enhance their task performance. Previously, the textual information incorporated into POI representations typically involved only POI categories or check-in content, leading to relatively weak textual features in existing methods. In contrast, large language models (LLMs) trained on extensive text data have been found to possess rich textual knowledge. However leveraging such knowledge to enhance POI representation learning presents two key challenges: first, how to extract POI-related knowledge from LLMs effectively, and second, how to integrate the extracted information to enhance POI representations. To address these challenges, we propose POI-Enhancer, a portable framework that leverages LLMs to improve POI representations produced by classic POI learning models. We first design three specialized prompts to extract semantic information from LLMs efficiently. Then, the Dual Feature Alignment module enhances the quality of the extracted information, while the Semantic Feature Fusion module preserves its integrity. The Cross Attention Fusion module then fully adaptively integrates such high-quality information into POI representations and Multi-View Contrastive Learning further injects human-understandable semantic information into these representations. Extensive experiments on three real-world datasets demonstrate the effectiveness of our framework, showing significant improvements across all baseline representations.
Generative document retrieval is a novel retrieval framework, which represents documents as identifiers (DocID) and retrieves documents by generating DocIDs. It has the advantage of end-to-end optimization over traditional retrieval methods and has attracted much research interest. Nonetheless, the development of efficient and precise DocIDs for document representation remains a pertinent issue within the field. Existing methods for designing DocIDs tend to consider only the relevance of DocIDs to the corresponding documents, while neglecting the ability of the DocIDs to distinguish the corresponding documents from similar ones, which is crucial for the retrieval task. In this paper, we design learnable descriptive and discriminative document Identifiers (D2-DocID) for Generative Retrieval and propose the paired retrieval model D2Gen. The D2-DocID is semantically similar to the corresponding documents (descriptive) and is able to distinguish similar documents (discriminative) in the corpus, thus enhancing retrieval performance. We use a contrastive learning assisted generative retrieval task to enable the model to understand the document and then complete the generative retrieval. We then design a DocID selection method to select DocIDs based on the retrieval model's understanding of the documents. Our experimental results on the MS MARCO and NQ320k dataset illustrate the effectiveness of the approach.
Micro-video popularity prediction (MVPP) plays a crucial role in various downstream applications. Recently, multimodal methods that integrate multiple modalities to predict the popularity have exhibited impressive performance. However, these methods face several unresolved issues: (1) limited contextual information and (2) incomplete modal semantics. Incorporating relevant videos and performing full fine-tuning on pre-trained models typically achieves powerful capabilities in addressing these issues. However, this paradigm is not optimal due to its weak transferability and scarce downstream data. Inspired by prompt learning, we propose ICPF, a novel In-Context Prompt-augmented Framework to enhance popularity prediction. ICPF maintains a model-agnostic design, facilitating seamless integration with various multimodal fusion models. Specifically, the multi-branch retriever first retrieves similar modal content through within-modality similarities. Next, in-context prompt generator extracts semantic prior features from retrieved videos and generates in-context prompts, enriching pre-trained models with valuable contextual knowledge. Finally, knowledge-augmented predictor captures complementary features including modal semantics and popularity information. Extensive experiments conducted on three real-world datasets demonstrate the superiority of ICPF compared to 14 competitive baselines.
Given the ubiquity of multi-task in practical systems, Multi-Task Learning (MTL) has found widespread application across diverse domains. In real-world scenarios, these tasks often have different priorities. For instance, In web search, relevance is often prioritized over other metrics, such as click-through rates or user engagement. Existing frameworks pay insufficient attention to the prioritization among different tasks, which typically adjust task-specific loss function weights to differentiate task priorities. However, this approach encounters challenges as the number of tasks grows, leading to exponential increases in hyper-parameter tuning complexity. Furthermore, the simultaneous optimization of multiple objectives can negatively impact the performance of high-priority tasks due to interference from lower-priority tasks. In this paper, we introduce a novel multi-task learning framework employing Lagrangian Differential Multiplier Methods for step-wise multi-task optimization. It is designed to boost the performance of high-priority tasks without interference from other tasks. Its primary advantage lies in its ability to automatically optimize multiple objectives without requiring balancing hyper-parameters for different tasks, thereby eliminating the need for manual tuning. Additionally, we provide theoretical analysis demonstrating that our method ensures optimization guarantees, enhancing the reliability of the process. We demonstrate its effectiveness through experiments on multiple public datasets and its application in Taobao search, a large-scale industrial search ranking system, resulting in significant improvements across various business metrics.
Outlier detection (OD) is the task of identifying unusual observations (or outliers) from a given or upcoming data by learning unique patterns of normal observations (or inliers). Recently, a study introduced a powerful unsupervised OD (UOD) solver based on a new observation of deep generative models, called inlier-memorization (IM) effect, which suggests that generative models memorize inliers before outliers in early learning stages. In this study, we aim to develop a theoretically principled method to address UOD tasks by maximally utilizing the IM effect. We begin by observing that the IM effect is observed more clearly when the given training data contain fewer outliers. This finding indicates a potential for enhancing the IM effect in UOD regimes if we can effectively exclude outliers from mini-batches when designing the loss function. To this end, we introduce two main techniques: 1) increasing the mini-batch size as the model training proceeds and 2) using an adaptive threshold to calculate the truncated loss function. We theoretically show that these two techniques effectively filter out outliers from the truncated loss function, allowing us to utilize the IM effect to the fullest. Coupled with an additional ensemble technique, we propose our method and term it Adaptive Loss Truncation with Batch Increment (ALTBI). We provide extensive experimental results to demonstrate that ALTBI achieves state-of-the-art performance in identifying outliers compared to other recent methods, even with lower computation costs. Additionally, we show that our method yields robust performances when combined with privacy-preserving algorithms.
Unsupervised anomaly detection (UAD) plays an important role in modern data analytics and it is crucial to provide simple yet effective and guaranteed UAD algorithms for real applications. In this paper, we present a novel UAD method for tabular data by evaluating how much noise is in the data. Specifically, we propose to learn a deep neural network from the clean (normal) training dataset and a noisy dataset, where the latter is generated by adding highly diverse noises to the clean data. The neural network can learn a reliable decision boundary between normal data and anomalous data when the diversity of the generated noisy data is sufficiently high so that the hard abnormal samples lie in the noisy region. Importantly, we provide theoretical guarantees, proving that the proposed method can detect anomalous data successfully, although the method does not utilize any real anomalous data in the training stage. Extensive experiments through more than 60 benchmark datasets demonstrate the effectiveness of the proposed method in comparison to 12 baselines of UAD. Our method obtains a 92.27% AUC score and a 1.68 ranking score on average. Moreover, compared to the state-of-the-art UAD methods, our method is easier to implement.
By generating new yet effective data, data augmentation has become a promising method to mitigate the data sparsity problem in sequential recommendation. Existing works focus on augmenting the original data but rarely explore the issue of imbalanced relevance and diversity for augmented data, leading to semantic drift problems or limited performance improvements. In this paper, we propose a novel Balanced data Augmentation Plugin for Sequential Recommendation (BASRec) to generate data that balance relevance and diversity. BASRec consists of two modules: Single-sequence Augmentation and Cross-sequence Augmentation. The former leverages the randomness of the heuristic operators to generate diverse sequences for a single user, after which the diverse and the original sequences are fused at the representation level to obtain relevance. Further, we devise a reweighting strategy to enable the model to learn the preferences based on the two properties adaptively. The Cross-sequence Augmentation performs nonlinear mixing between different sequence representations from two directions. It produces virtual sequence representations that are diverse enough but retain the vital semantics of the original sequences. These two modules enhance the model to discover fine-grained preferences knowledge from single-user and cross-user perspectives. Extensive experiments verify the effectiveness of BASRec. The average improvement is up to 72.0% on GRU4Rec, 33.8% on SASRec, and 68.5% on FMLP-Rec. We demonstrate that BASRec generates data with a better balance between relevance and diversity than existing methods.
Modeling geospatial tabular data with deep learning has become a promising alternative to traditional statistical and machine learning approaches. However, existing deep learning models often face challenges related to scalability and flexibility as datasets grow. To this end, this paper introduces GeoAggregator, an efficient and lightweight algorithm based on transformer architecture designed specifically for geospatial tabular data modeling. GeoAggregators explicitly account for spatial autocorrelation and spatial heterogeneity through Gaussian-biased local attention and global positional awareness. Additionally, we introduce a new attention mechanism that uses the Cartesian product to manage the size of the model while maintaining strong expressive power. We benchmark GeoAggregator against spatial statistical models, XGBoost, and several state-of-the-art geospatial deep learning methods using both synthetic and empirical geospatial datasets. The results demonstrate that GeoAggregators achieve the best or second-best performance compared to their competitors on nearly all datasets. GeoAggregator's efficiency is underscored by its reduced model size, making it both scalable and lightweight. Moreover, ablation experiments offer insights into the effectiveness of the Gaussian bias and Cartesian attention mechanism, providing recommendations for further optimizing the GeoAggregator's performance.
Knowledge Graphs (KGs) are structured data presented as directed graphs. Due to the common issues of incompleteness and inaccuracy encountered during construction and maintenance, completing KGs becomes a critical task. Inductive Knowledge Graph Completion (KGC) excels at inferring patterns or models from seen data to be applied to unseen data. However, existing methods mainly focus on new entities, while relations are usually randomly initialized. To this end, we propose TARGI, a simple yet effective inductive method for KGC. Specifically, we first construct a global relation graph for each topology from a global graph perspective, thus leveraging the in-variance of relation structures. We then utilize this graph to aggregate the rich embeddings of new relations and new entities, thereby performing KGC robustly in inductive scenarios. This successfully addresses the excessive reliance on the degree of relations and resolves the high complexity and limited scope of enclosing subgraph sampling in existing fully inductive algorithms. We conduct KGC experiments on six inductive datasets using inference data where entities are entirely new and new relations at 100 percent, 50 percent, and 0 percent radios. Extensive results demonstrate that our model accurately learns the topological structures and embeddings of new relations, and guides the embedding learning of new entities. Notably, our model outperforms 15 SOTA methods, especially in two fully inductive datasets.
Hubs are a few points that frequently appear in the k-nearest neighbors (kNN) of many other points in a high-dimensional data set. The hubs' effects, called the hubness phenomenon, degrade the performance of kNN based models in high dimensions. We present SamHub, a simple sampling approach to efficiently identify hubs with theoretical guarantees. Apart from previous works based on approximate kNN indexes, SamHub is generic and applicable to any distance measure with negligible additional memory footprint. Empirically, by sampling only 10% of points, SamHub runs significantly faster and offers higher accuracy than existing hub detection methods on many real-world data sets with dot product, L1, L2, and dynamic time warping distances. Our ablation studies of SamHub on improving kNN-based classification show potential for other high-dimensional data analysis tasks.
Neural networks remain black-box systems, unsure about their outputs, and their performance may drop unpredictably in real applications. An open question is how to qualitatively extend neural networks, so that they are sure about their reasoning results, or reasoning-for-sure. Here, we introduce set-theoretic relations explicitly and seamlessly into neural networks by extending vector embedding into sphere embedding, so that part-whole relations can explicitly encode set-theoretic relations through sphere boundaries in the vector space. A reasoning-for-sure neural network successfully constructs, within a constant number M of epochs, a sphere configuration as its semantic model for any consistent set-theoretic relation. We implement Hyperbolic Sphere Neural Network (HSphNN), the first reasoning-for-sure neural network for all types of Aristotelian syllogistic reasoning. Its construction process is realised as a sequence of neighbourhood transitions from the current towards the target configuration. We prove M=1 for HSphNN. In experiments, HSphNN achieves the symbolic level rigour of syllogistic reasoning and successfully checks both decisions and explanations of ChatGPT (gpt-3.5-turbo and gpt-4o) without errors. Through prompts, HSphNN improves the performance of gpt-3.5-turbo from 46.875% to 58.98%, and of gpt-4o from 82.42% to 84.76%. We show ways to extend HSphNN for various kinds of logical and Bayesian reasoning, and to integrate it with traditional neural networks seamlessly.
Large language models (LLMs) provide a promising way for accurate session-based recommendation (SBR), but they demand substantial computational time and memory. Knowledge distillation (KD)-based methods can alleviate these issues by transferring the knowledge to a small student, which trains a student based on the predictions of a cumbersome teacher. However, these methods encounter difficulties for LLM-based KD in SBR. 1) It is expensive to make LLMs predict for all instances in KD. 2) LLMs may make ineffective predictions for some instances in KD, e.g., incorrect predictions for hard instances or similar predictions as existing recommenders for easy instances. In this paper, we propose an active LLM-based KD method in SBR, contributing to sustainable AI. To efficiently distill knowledge from LLMs with limited cost, we propose to extract a small proportion of instances predicted by LLMs. Meanwhile, for a more effective distillation, we propose an active learning strategy to extract instances that are as effective as possible for KD from a theoretical view. Specifically, we first formulate gains based on potential effects (e.g., effective, similar, and incorrect predictions by LLMs) and difficulties (e.g., easy or hard to fit) of instances for KD. Then, we propose to maximize the minimal gains of distillation to find the optimal selection policy for active learning, which can largely avoid extracting ineffective instances in KD. Experiments on real-world datasets show that our method significantly outperforms state-of-the-art methods for SBR.
Fraud is increasingly prevalent, and its patterns are frequently changing, posing challenges for fraud detection methods such as random forests and Graph Neural Networks (GNNs), which rely on bin-based and mixture features separately. The former may lose crucial graph-associated features, while the latter face incorrect feature fusion. To overcome these limitations, we propose an approach based on attribute-association pattern that leverages the distinct attribute and association patterns differentiating fraudulent from benign behaviors, to enhance fraud detection capabilities. Attribute features are adaptively split into separate bins to eliminate incorrect attribute fusion and combine association patterns through graph neighbor message passing, thereby deriving attribute-association pattern features. Using the learned attribute-association patterns, the fraud patterns between a single pattern and the patterns across the entire graph are globally aggregated. Extensive experiments comparing our approach with 24 methods on 7 datasets demonstrate that the proposed method achieves SOTA performance.
Tabular data, widely used across industries, remains underexplored in deep learning. Self-supervised learning (SSL) shows promise for pre-training deep neural networks (DNNs) on tabular data, but its potential is hindered by challenges in designing suitable augmentations. Unlike image and text data, where SSL leverages inherent spatial or semantic structures, tabular data lacks such explicit structure. This makes traditional input-level augmentations, like modifying or removing features, less effective due to difficulties in balancing critical information preservation with variability. To address these challenges, we propose RaTab, a novel method that shifts augmentation from input-level to representation-level using matrix factorization, specifically truncated SVD. This approach preserves essential data structures while generating diverse representations by applying dropout at various stages of the representation, thereby significantly enhancing SSL performance for tabular data.