| Total: 71
Stock investment recommendation is crucial for guiding investment decisions and managing portfolios. Recent studies have demonstrated the potential of temporal-relational models (TRM) to yield excess investment returns. However, in the complicated finance ecosystem, the current TRM suffer from both the intrinsic temporal bias from the low signal-to-noise ratio (SNR) and the relational bias caused by utilizing inappropriate relational topologies and propagation mechanisms. Moreover, the distribution shifts behind macro-market scenarios invalidate the underlying i.i.d. assumption and limit the generalization ability of TRM. In this paper, we pioneer the impact of the above issues on the effective learning of temporal-relational patterns and propose an Automatic De-Biased Temporal-Relational Model (ADB-TRM) for stock recommendation. Specifically, ADB-TRM consists of three main components, i.e., (i) a meta-learned architecture forms a dual-stage training process, with the inner part ameliorating temporal-relational bias and the outer meta-learner counteracting distribution shifts, (ii) automatic adversarial sample generation guides the model adaptively to alleviate bias and enhance its profiling ability through adversarial training, and (iii) global-local interaction helps seek relative invariant stock embeddings from local and global distribution perspectives to mitigate distribution shifts. Experiments on three datasets from distinct stock markets show that ADB-TRM excels state-of-the-arts over 28.41% and 9.53% in terms of cumulative and risk-adjusted returns.
Occupational skill demand (OSD) forecasting seeks to predict dynamic skill demand specific to occupations, beneficial for employees and employers to grasp occupational nature and maintain a competitive edge in the rapidly evolving labor market. Although recent research has proposed data-driven techniques for forecasting skill demand, the focus has remained predominantly on overall trends rather than occupational granularity. In this paper, we propose a novel Pre-training Enhanced Dynamic Graph Autoencoder (Pre-DyGAE), forecasting skill demand from an occupational perspective. Specifically, we aggregate job descriptions (JDs) by occupation and segment them into several timestamps. Subsequently, in the initial timestamps, we pre-train a graph autoencoder (GAE), consisting of a semantically-aware cross-attention enhanced uncertainty-aware encoder and decoders for link prediction and edge regression to achieve graph reconstruction. In particular, we utilize contrastive learning on skill cooccurrence clusters to solve the data sparsity and a unified Tweedie and ranking loss for predicting the imbalanced distribution. Afterward, we incorporate an adaptive temporal encoding unit and a temporal shift module into GAE to achieve a dynamic GAE (DyGAE). Furthermore, we fine-tune the DyGAE with a two-stage optimization strategy and infer future representations. Extensive experiments on four real-world datasets validate the effectiveness of Pre-DyGAE compared with state-of-the-art baselines.
Multi-modality spatio-temporal (MoST) data extends spatio-temporal (ST) data by incorporating multiple modalities, which is prevalent in monitoring systems, encompassing diverse traffic demands and air quality assessments. Despite significant strides in ST modeling in recent years, there remains a need to emphasize harnessing the potential of information from different modalities. Robust MoST forecasting is more challenging because it possesses (i) high-dimensional and complex internal structures and (ii) dynamic heterogeneity caused by temporal, spatial, and modality variations. In this study, we propose a novel MoST learning framework via Self-Supervised Learning, namely MoSSL, which aims to uncover latent patterns from temporal, spatial, and modality perspectives while quantifying dynamic heterogeneity. Experiment results on two real-world MoST datasets verify the superiority of our approach compared with the state-of-the-art baselines. Model implementation is available at https://github.com/beginner-sketch/MoSSL.
Graph Convolutional Networks (GCNs) have become pivotal in recommendation systems for learning user and item embeddings by leveraging the user-item interaction graph's node information and topology. However, these models often face the famous over-smoothing issue, leading to indistinct user and item embeddings and reduced personalization. Traditional desmoothing methods in GCN-based systems are model-specific, lacking a universal solution. This paper introduces a novel, model-agnostic approach named Desmoothing Framework for GCN-based Recommendation Systems (DGR). It effectively addresses over-smoothing on general GCN-based recommendation models by considering both global and local perspectives. Specifically, we first introduce vector perturbations during each message passing layer to penalize the tendency of node embeddings approximating overly to be similar with the guidance of the global topological structure. Meanwhile, we further develop a tailored-design loss term for the readout embeddings to preserve the local collaborative relations between users and their neighboring items. In particular, items that exhibit a high correlation with neighboring items are also incorporated to enhance the local topological information. To validate our approach, we conduct extensive experiments on 5 benchmark datasets based on 5 well-known GCN-based recommendation models, demonstrating the effectiveness and generalization of our proposed framework. Our code is available at https://github.com/me-sonandme/DGR.
Recent empirical evidence suggests that real-world networks have very low underlying dimensionality. We provide a theoretical explanation for this phenomenon as well as develop a linear-time algorithm for detecting the underlying dimensionality of such networks. Our theoretical analysis considers geometric inhomogeneous random graphs (GIRGs), a geometric random graph model, which captures a variety of properties observed in real-world networks. These properties include a heterogeneous degree distribution and non-vanishing clustering coefficient, which is the probability that two random neighbors of a vertex are adjacent. Our first result shows that the clustering coefficient of GIRGs scales inverse exponentially with respect to the number of dimensions d, when the latter is at most logarithmic in n, the number of vertices. Hence, for a GIRG to behave like many real-world networks and have a non-vanishing clustering coefficient, it must come from a geometric space of o(log n) dimensions. Our analysis on GIRGs allows us to obtain a linear-time algorithm for determining the dimensionality of a network. Our algorithm bridges the gap between theory and practice, as it comes with a rigorous proof of correctness and yields results comparable to prior empirical approaches, as indicated by our experiments on real-world instances. The efficiency of our algorithm makes it applicable to very large data-sets. We conclude that very low dimensionalities (from 1 to 10) are needed to explain properties of real-world networks.
Open set recognition is a crucial research theme for open-environment machine learning. For this problem, a common solution is to learn compact representations of known classes and identify unknown samples by measuring deviations from these known classes. However, the aforementioned methods (1) lack open training consideration, which is detrimental to the fitting of known classes, and (2) recognize unknown classes on an inadequate basis, which limits the accuracy of recognition. In this study, we propose an open reconstruction learning framework that learns a union boundary region of known classes to characterize unknown space. This facilitates the isolation of known space from unknown space to represent known classes compactly and provides a more reliable recognition basis from the perspective of both known and unknown space. Specifically, an adversarial constraint is used to generate class-specific boundary samples. Then, the known classes and approximate unknown space are fitted with manifolds represented by class-specific auto-encoders. Finally, the auto-encoders output the reconstruction error in terms of known and unknown spaces to recognize samples. Extensive experimental results show that the proposed method outperforms existing advanced methods and achieves new stateof-the-art performance. The code is available at https://github.com/Ashowman98/CSGRL.
Collaborative filtering (CF) methods for recommendation systems have been extensively researched, ranging from matrix factorization and autoencoder-based to graph filtering-based methods. Recently, lightweight methods that require almost no training have been recently proposed to reduce overall computation. However, existing methods still have room to improve the trade-offs among accuracy, efficiency, and robustness. In particular, there are no well-designed closed-form studies for balanced CF in terms of the aforementioned trade-offs. In this paper, we design SVD-AE, a simple yet effective singular vector decomposition (SVD)-based linear autoencoder, whose closed-form solution can be defined based on SVD for CF. SVD-AE does not require iterative training processes as its closed-form solution can be calculated at once. Furthermore, given the noisy nature of the rating matrix, we explore the robustness against such noisy interactions of existing CF methods and our SVD-AE. As a result, we demonstrate that our simple design choice based on truncated SVD can be used to strengthen the noise robustness of the recommendation while improving efficiency. Code is available at https://github.com/seoyoungh/svd-ae.
The Variable Subset Forecasting (VSF) problem, where the majority of variables are unavailable in the inference stage of multivariate forecasting, has been an important but under-explored task with broad impacts in many real-world applications. Missing values, absent inter-correlation, and the impracticality of retraining largely hinder the ability of multivariate forecasting models to capture inherent relationships among variables, impacting their performance. However, existing approaches towards these issues either heavily rely on local temporal correlation or face limitations in fully recovering missing information from the unavailable subset, accompanied by notable computational expenses. To address these problems, we propose a novel density estimation solution to recover the information of missing variables via flows-based generative framework. In particular, a novel generative network for time series, namely Time-series Reconstruction Flows (TRF), is proposed to estimate and reconstruct the missing variable subset. In addition, a novel meta-training framework, Variable-Agnostic Meta Learning, has been developed to enhance the generalization ability of TRF, enabling it to adapt to diverse missing variables situations. Finally, extensive experiments are conducted to demonstrate the superiority of our proposed method compared with baseline methods.
Many methods of Directed Graph Neural Networks (DGNNs) are designed to equally treat nodes in the same neighbor set (i.e., out-neighbor set and in-neighbor set) for every node, without considering the node diversity in directed graphs, so they are often unavailable to adaptively acquire suitable information from neighbors of different directions. To alleviate this issue, in this paper, we investigate a new way to first consider node diversity for representation learning on directed graphs, i.e., neighbor diversity and degree diversity, and then propose a new NDDGNN framework to adaptively assign weights to both outgoing information and incoming information at the node level. Extensive experiments on seven real-world datasets validate the superior performance of our method compared to state-of-the-art methods in terms of both node classification and link prediction tasks.
Many multiplex graph representation learning (MGRL) methods have been demonstrated to 1) ignore the globally positive and negative relationships among node features; and 2) usually utilize the node classification task to train both graph structure learning and representation learning parameters, and thus resulting in the problem of edge starvation. To address these issues, in this paper, we propose a new MGRL method based on the bi-level optimization. Specifically, in the inner level, we optimize the self-expression matrix to capture the globally positive and negative relationships among nodes, as well as complement them with the local relationships in graph structures. In the outer level, we optimize the parameters of the graph convolutional layer to obtain discriminative node representations. As a result, the graph structure optimization does not depend on the node classification task, which solves the edge starvation problem. Extensive experiments show that our model achieves the superior performance on node classification tasks on all datasets.
Points of interest (POIs) carry a wealth of semantic information of varying locations in cities and thus have been widely used to enable various location-based services. To understand POI semantics, existing methods usually model contextual correlations of POI categories in users' check-in sequences and embed categories into a latent space based on the word2vec framework. However, such an approach does not fully capture the underlying hierarchical relationship between POI categories and can hardly integrate the category hierarchy into various deep sequential models. To overcome this shortcoming, we propose a Semantically Disentangled POI Category Embedding Model (SD-CEM) to generate hierarchy-enhanced category representations using disentangled mobility sequences. Specifically, first, we construct disentangled mobility sequences using human mobility data based on the semantics of POIs. Then we utilize the POI category hierarchy to initialize a hierarchy-enhanced representation for each category in the disentangled sequences, employing an attention mechanism. Finally, we optimize these category representations by incorporating both the masked category prediction task and the next category prediction task. To evaluate the effectiveness of SD-CEM, we conduct comprehensive experiments using two check-in datasets covering three tasks. Experimental results demonstrate that SD-CEM outperforms several competitive baselines, highlighting its substantial improvement in performance as well as the understanding of learned category representations.
With the widespread popularity of massive open online courses, personalized course recommendation has become increasingly important due to enhancing users' learning efficiency. While achieving promising performances, current works suffering from the vary across the users and other MOOC entities. To address this problem, we propose hierarchical reinforcement learning with a multi-channel hypergraphs neural network for course recommendation(called HHCoR). Specifically, we first construct an online course hypergraph as the environment to capture the complex relationships and historical information by considering all entities. Then, we design a multi-channel propagation mechanism to aggregate embeddings in the online course hypergraph and extract user interest through an attention layer. Besides, we employ two-level decision-making: the low-level focuses on the rating courses, while the high-level integrates these considerations to finalize the decision. Furthermore, in co-optimization, we design a joint reward function to improve the policy of two-layer agents. Finally, we conducted extensive experiments on two real-world datasets and the quantitative results have demonstrated the effectiveness of the proposed method.
The Hilbert-Schmidt Independence Criterion (HSIC) based on kernel functions is capable of detecting nonlinear dependencies between variables, making it a common method for association relationship mining. However, in situations with small samples, high dimensions, or noisy data, it may generate spurious associations, causing two unrelated variables to have certain scores. To address this issue, we propose a novel criterion, named as Pure Hilbert-Schmidt Independence Criterion (PHSIC). PHSIC is achieved by subtracting the mean HSIC obtained under random conditions from the original HSIC value. We demonstrate three significant advantages of PHSIC through theoretical and simulation experiments: (1) PHSIC has a baseline of zero, enhancing the interpretability of HSIC. (2) Compared to HSIC, PHSIC exhibits lower bias. (3) PHSIC enables a fairer comparison across different samples and dimensions. To validate the effectiveness of PHSIC, we apply it to multiple causal inference tasks to measure the independence between cause and residual. Experimental results demonstrate that the causal model based on PHSIC performs well compared to other methods in scenarios involving small sample sizes and noisy data, both in real and simulated datasets.
Conventional deep learning methods typically employ supervised learning for drug response prediction (DRP). This entails dependence on labeled response data from drugs for model training. However, practical applications in the preclinical drug screening phase demand that DRP models predict responses for novel compounds, often with unknown drug responses. This presents a challenge, rendering supervised deep learning methods unsuitable for such scenarios. In this paper, we propose a zero-shot learning solution for the DRP task in preclinical drug screening. Specifically, we propose a Multi-branch Multi-Source Domain Adaptation Test Enhancement Plug-in, called MSDA. MSDA can be seamlessly integrated with conventional DRP methods, learning invariant features from the prior response data of similar drugs to enhance real-time predictions of unlabeled compounds. The results of experiments on two large drug response datasets showed that MSDA efficiently predicts drug responses for novel compounds, leading to a general performance improvement of 5-10% in the preclinical drug screening phase. The significance of this solution resides in its potential to accelerate the drug discovery process, improve drug candidate assessment, and facilitate the success of drug discovery. The code is available at https://github.com/DrugD/MSDA.
Deep learning-based drug response prediction (DRP) methods can accelerate the drug discovery process and reduce research and development costs. Despite their high accuracy, generating regression-aware representations remains challenging for mainstream approaches. For instance, the representations are often disordered, aggregated, and overlapping, and they fail to characterize distinct samples effectively. This results in poor representation during the DRP task, diminishing generalizability and potentially leading to substantial costs during the drug discovery. In this paper, we propose CLDR, a contrastive learning framework with natural language supervision for the DRP. The CLDR converts regression labels into text, which is merged with the drug response caption as a second sample modality instead of the traditional modes, i.e., graphs and sequences. Simultaneously, a common-sense numerical knowledge graph is introduced to improve the continuous text representation. Our framework is validated using the genomics of drug sensitivity in cancer dataset with average performance increases ranging from 7.8% to 31.4%. Furthermore, experiments demonstrate that the proposed CLDR effectively maps samples with distinct label values into a high-dimensional space. In this space, the sample representations are scattered, significantly alleviating feature overlap. The code is available at: https://github.com/DrugD/CLDR.
Knowledge Graphs (KGs) are pivotal in various NLP applications but often grapple with incompleteness, especially due to the long-tail problem where infrequent, unpopular relationships drastically reduce the KG completion performance. In this paper, we focus on Few-shot Knowledge Graph Completion (FKGC), a task addressing these gaps in long-tail scenarios. Amidst the rapid evolution of Large Language Models, we propose a generation-based FKGC paradigm facilitated by LLM distillation. Our MuKDC framework employs multi-level knowledge distillation for few-shot KG completion, generating supplementary knowledge to mitigate data scarcity in few-shot environments. MuKDC comprises two primary components: Multi-level Knowledge Generation, which enriches the KG at various levels, and Consistency Assessment, to ensure the coherence and reliability of the generated knowledge. Most notably, our method achieves SOTA results in both FKGC and multi-modal FKGC benchmarks, significantly advancing KG completion and enhancing the understanding and application of LLMs in structured knowledge generation and assessment.
Graph classification benchmarks, vital for assessing and developing graph neural network (GNN) models, have recently been scrutinized, as simple methods like MLPs have demonstrated comparable performance. This leads to an important question: Do these benchmarks effectively distinguish the advancements of GNNs over other methodologies? If so, how do we quantitatively measure this effectiveness? In response, we first propose an empirical protocol based on a fair benchmarking framework to investigate the performance discrepancy between simple methods and GNNs. We further propose a novel metric to quantify the dataset effectiveness by considering both dataset complexity and model performance. To the best of our knowledge, our work is the first to thoroughly study and provide an explicit definition for dataset effectiveness in the graph learning area. Through testing across 16 real-world datasets, we found our metric to align with existing studies and intuitive assumptions. Finally, we explore the causes behind the low effectiveness of certain datasets by investigating the correlation between intrinsic graph properties and class labels, and we developed a novel technique supporting the correlation-controllable synthetic dataset generation. Our findings shed light on the current understanding of benchmark datasets, and our new platform could fuel the future evolution of graph classification benchmarks.
Recently, dual-target cross-domain recommendation (DTCDR) has been proposed to alleviate the data sparsity problem by sharing the common knowledge across domains simultaneously. However, existing methods often assume that personal data containing abundant identifiable information can be directly accessed, which results in a controversial privacy leakage problem of DTCDR. To this end, we introduce the P2DTR framework, a novel approach in DTCDR while protecting private user information. Specifically, we first design a novel inter-client knowledge extraction mechanism, which exploits the private set intersection algorithm and prototype-based federated learning to enable collaboratively modeling among multiple users and a server. Furthermore, to improve the recommendation performance based on the extracted common knowledge across domains, we proposed an intra-client enhanced recommendation, consisting of a constrained dominant set (CDS) propagation mechanism and dual-recommendation module. Extensive experiments on real-world datasets validate that our proposed P2DTR framework achieves superior utility under a privacy-preserving guarantee on both domains.
Self-supervised methods have gained prominence in time series anomaly detection due to the scarcity of available annotations. Nevertheless, they typically demand extensive training data to acquire a generalizable representation map, which conflicts with scenarios of a few available samples, thereby limiting their performance. To overcome the limitation, we propose AnomalyLLM, a knowledge distillation-based time series anomaly detection approach where the student network is trained to mimic the features of the large language model (LLM)-based teacher network that is pretrained on large-scale datasets. During the testing phase, anomalies are detected when the discrepancy between the features of the teacher and student networks is large. To circumvent the student network from learning the teacher network’s feature of anomalous samples, we devise two key strategies. 1) Prototypical signals are incorporated into the student network to consolidate the normal feature extraction. 2) We use synthetic anomalies to enlarge the representation gap between the two networks. AnomalyLLM demonstrates state-of-the-art performance on 15 datasets, improving accuracy by at least 14.5% in the UCR dataset.
Graph Transformers (GTs) have demonstrated their advantages across a wide range of tasks. However, the self-attention mechanism in GTs overlooks the graph's inductive biases, particularly biases related to structure, which are crucial for the graph tasks. Although some methods utilize positional encoding and attention bias to model inductive biases, their effectiveness is still suboptimal analytically. Therefore, this paper presents Gradformer, a method innovatively integrating GT with the intrinsic inductive bias by applying an exponential decay mask to the attention matrix. Specifically, the values in the decay mask matrix diminish exponentially, correlating with the decreasing node proximities within the graph structure. This design enables Gradformer to retain its ability to capture information from distant nodes while focusing on the graph's local details. Furthermore, Gradformer introduces a learnable constraint into the decay mask, allowing different attention heads to learn distinct decay masks. Such an design diversifies the attention heads, enabling a more effective assimilation of diverse structural information within the graph. Extensive experiments on various benchmarks demonstrate that Gradformer consistently outperforms the Graph Neural Network and GT baseline models in various graph classification and regression tasks. Additionally, Gradformer has proven to be an effective method for training deep GT models, maintaining or even enhancing accuracy compared to shallow models as the network deepens, in contrast to the significant accuracy drop observed in other GT models. Codes are available at https://github.com/LiuChuang0059/Gradformer.
Graph masked autoencoders (GMAE) have emerged as a significant advancement in self-supervised pre-training for graph-structured data. Previous GMAE models primarily utilize a straightforward random masking strategy for nodes or edges during training. However, this strategy fails to consider the varying significance of different nodes within the graph structure. In this paper, we investigate the potential of leveraging the graph's structural composition as a fundamental and unique prior in the masked pre-training process. To this end, we introduce a novel structure-guided masking strategy (i.e., StructMAE), designed to refine the existing GMAE models. StructMAE involves two steps: 1) Structure-based Scoring: Each node is evaluated and assigned a score reflecting its structural significance. Two distinct types of scoring manners are proposed: predefined and learnable scoring. 2) Structure-guided Masking: With the obtained assessment scores, we develop an easy-to-hard masking strategy that gradually increases the structural awareness of the self-supervised reconstruction task. Specifically, the strategy begins with random masking and progresses to masking structure-informative nodes based on the assessment scores. This design gradually and effectively guides the model in learning graph structural information. Furthermore, extensive experiments consistently demonstrate that our StructMAE method outperforms existing state-of-the-art GMAE models in both unsupervised and transfer learning tasks. Codes are available at https: //github.com/LiuChuang0059/StructMAE.
The standardized Geometry-based Point Cloud Compression (G-PCC) suffers from limited coding performance and low-quality reconstruction. To address this, we propose AuxGR, a performance-complexity tradeoff solution for point cloud geometry restoration: leveraging auxiliary bitstream to enhance the quality of G-PCC compressed point cloud geometry. This auxiliary bitstream efficiently encapsulates spatio-temporal information. For static coding, we perform paired information embedding (PIE) on the G-PCC decoded frame by employing target convolutions from its original counterpart, producing an auxiliary bitstream containing abundant original information. For dynamic coding, in addition to PIE, we propose temporal information embedding (TIE) to capture motion information between the previously restored and the current G-PCC decoded frames. TIE applies target kNN attention between them, which ensures the temporal neighborhood construction for each point and implicitly represents motions. Due to the similarity across temporal frames, only the residuals between TIE and PIE outputs are compressed as auxiliary bitstream. Experimental results demonstrate that AuxGR notably outperforms existing methods in both static and dynamic coding scenarios. Moreover, our framework enables the flexible incorporation of auxiliary information under computation constraints, which is attractive to real applications.
Continual Knowledge Graph Embedding (CKGE) aims to efficiently learn new knowledge and simultaneously preserve old knowledge. Dominant approaches primarily focus on alleviating catastrophic forgetting of old knowledge but neglect efficient learning for the emergence of new knowledge. However, in real-world scenarios, knowledge graphs (KGs) are continuously growing, which brings a significant challenge to fine-tuning KGE models efficiently. To address this issue, we propose a fast CKGE framework (FastKGE), incorporating an incremental low-rank adapter (IncLoRA) mechanism to efficiently acquire new knowledge while preserving old knowledge. Specifically, to mitigate catastrophic forgetting, FastKGE isolates and allocates new knowledge to specific layers based on the fine-grained influence between old and new KGs. Subsequently, to accelerate fine-tuning, FastKGE devises an efficient IncLoRA mechanism, which embeds the specific layers into incremental low-rank adapters with fewer training parameters. Moreover, IncLoRA introduces adaptive rank allocation, which makes the LoRA aware of the importance of entities and adjusts its rank scale adaptively. We conduct experiments on four public datasets and two new datasets with a larger initial scale. Experimental results demonstrate that FastKGE can reduce training time by 34%-49% while still achieving competitive link prediction performance against state-of-the-art models on four public datasets (average MRR score of 21.0% vs. 21.1%). Meanwhile, on two newly constructed datasets, FastKGE saves 51%-68% training time and improves link prediction performance by 1.5%.
Pre-travel recommendation is developed to provide a variety of out-of-town Point-of-Interests (POIs) for users planning to travel away from their hometowns but have not yet decided on their destination. Existing out-of-town recommender systems work on constructing users' latent preferences and inferring travel intentions from their check-in sequences. However, there are still two challenges that hamper the performance of these approaches: i) Users' interactive data (including hometown and out-of-town check-ins) tend to be rare, and while candidate POIs that come from different regions contain various semantic information; ii) The causes for user check-in include not only interest but also conformity, which are easily entangled and overlooked. To fill these gaps, we propose a Knowledge-Driven Disentangled Causal metric learning framework (KDDC) that mitigates interaction data sparsity by enhancing POI semantic representation and considers the distributions of two causes (i.e., conformity and interest) for pre-travel recommendation. Specifically, we pretrain a constructed POI attribute knowledge graph through a segmented interaction method and POI semantic information is aggregated via relational heterogeneity. In addition, we devise a disentangled causal metric learning to model and infer userrelated representations. Extensive experiments on two real-world nationwide datasets display the consistent superiority of our KDDC over state-of-theart baselines.
Due to the complex and dynamic traffic contexts, the interpretability and uncertainty of traffic forecasting have gained increasing attention. Significance testing is a powerful tool in statistics used to determine whether a hypothesis is valid, facilitating the identification of pivotal features that predominantly contribute to the true relationship. However, existing works mainly regard traffic forecasting as a deterministic problem, making it challenging to perform effective significance testing. To fill this gap, we propose to conduct Full Bayesian Significance Testing for Neural Networks in Traffic Forecasting, namely ST-nFBST. A Bayesian neural network is utilized to capture the complicated traffic relationships through an optimization function resolved in the context of aleatoric uncertainty and epistemic uncertainty. Thereupon, ST-nFBST can achieve the significance testing by means of a delicate grad-based evidence value, further capturing the inherent traffic schema for better spatiotemporal modeling. Extensive experiments are conducted on METR-LA and PEMS-BAY to verify the advantages of our method in terms of uncertainty analysis and significance testing, helping the interpretability and promotion of traffic forecasting.