IJCAI.2024 - Multidisciplinary Topics and Applications

| Total: 46

#1 Predictive Modeling with Temporal Graphical Representation on Electronic Health Records [PDF3] [Copy] [Kimi1] [REL]

Authors: Jiayuan Chen, Changchang Yin, Yuanlong Wang, Ping Zhang

Deep learning-based predictive models, leveraging Electronic Health Records (EHR), are receiving increasing attention in healthcare. An effective representation of a patient's EHR should hierarchically encompass both the temporal relationships between historical visits and medical events, and the inherent structural information within these elements. Existing patient representation methods can be roughly categorized into sequential representation and graphical representation. The sequential representation methods focus only on the temporal relationships among longitudinal visits. On the other hand, the graphical representation approaches, while adept at extracting the graph-structured relationships between various medical events, fall short in effectively integrate temporal information. To capture both types of information, we model a patient's EHR as a novel temporal heterogeneous graph. This graph includes historical visits nodes and medical events nodes. It propagates structured information from medical event nodes to visit nodes and utilizes time-aware visit nodes to capture changes in the patient's health status. Furthermore, we introduce a novel temporal graph transformer (TRANS) that integrates temporal edge features, global positional encoding, and local structural encoding into heterogeneous graph convolution, capturing both temporal and structural information. We validate the effectiveness of TRANS through extensive experiments on three real-world datasets. The results show that our proposed approach achieves state-of-the-art performance.


#2 Federated Prompt Learning for Weather Foundation Models on Devices [PDF] [Copy] [Kimi2] [REL]

Authors: Shengchao Chen, Guodong Long, Tao Shen, Jing Jiang, Chengqi Zhang

On-device intelligence for weather forecasting uses local deep learning models to analyze weather patterns without centralized cloud computing, holds significance for supporting human activates. Federated Learning is a promising solution for such forecasting by enabling collaborative model training without sharing raw data. However, it faces three main challenges that hinder its reliability: (1) data heterogeneity among devices due to geographic differences; (2) data homogeneity within individual devices and (3) communication overload from sending large model parameters for collaboration. To address these challenges, this paper propose Federated Prompt learning for Weather Foundation Models on Devices (FedPoD), which enables devices to obtain highly customized models while maintaining communication efficiency. Concretely, our Adaptive Prompt Tuning leverages lightweight prompts guide frozen foundation model to generate more precise predictions, also conducts prompt-based multi-level communication to encourage multi-source knowledge fusion and regulate optimization. Additionally, Dynamic Graph Modeling constructs graphs from prompts, prioritizing collaborative training among devices with similar data distributions to against heterogeneity. Extensive experiments demonstrates FedPoD leads the performance among state-of-the-art baselines across various setting in real-world on-device weather forecasting datasets.


#3 Shadow-Free Membership Inference Attacks: Recommender Systems Are More Vulnerable Than You Thought [PDF] [Copy] [Kimi] [REL]

Authors: Xiaoxiao Chi, Xuyun Zhang, Yan Wang, Lianyong Qi, Amin Beheshti, Xiaolong Xu, Kim-Kwang Raymond Choo, Shuo Wang, Hongsheng Hu

Recommender systems have been successfully applied in many applications. Nonetheless, recent studies demonstrate that recommender systems are vulnerable to membership inference attacks (MIAs), leading to the leakage of users’ membership privacy. However, existing MIAs relying on shadow training suffer a large performance drop when the attacker lacks knowledge of the training data distribution and the model architecture of the target recommender system. To better understand the privacy risks of recommender systems, we propose shadow-free MIAs that directly leverage a user’s recommendations for membership inference. Without shadow training, the proposed attack can conduct MIAs efficiently and effectively under a practice scenario where the attacker is given only black-box access to the target recommender system. The proposed attack leverages an intuition that the recommender system personalizes a user’s recommendations if his historical interactions are used by it. Thus, an attacker can infer membership privacy by determining whether the recommendations are more similar to the interactions or the general popular items. We conduct extensive experiments on benchmark datasets across various recommender systems. Remarkably, our attack achieves far better attack accuracy with low false positive rates than baselines while with a much lower computational cost.


#4 Geometry-Guided Conditional Adaptation for Surrogate Models of Large-Scale 3D PDEs on Arbitrary Geometries [PDF] [Copy] [Kimi] [REL]

Authors: Jingyang Deng, Xingjian Li, Haoyi Xiong, Xiaoguang Hu, Jinwen Ma

Deep learning surrogate models aim to accelerate the solving of partial differential equations (PDEs) and have achieved certain promising results. Although several main-stream models through neural operator learning have been applied to delve into PDEs on varying geometries, they were designed to map the complex geometry to a latent uniform grid, which is still challenging to learn by the networks with general architectures. In this work, we rethink the critical factors of PDE solutions and propose a novel model-agnostic framework, called 3D Geometry-Guided Conditional adaptation (3D-GeoCA), for solving PDEs on arbitrary 3D geometries. Starting with a 3D point cloud geometry encoder, 3D-GeoCA can extract the essential and robust representations of any kind of geometric shapes, which conditionally guides the adaptation of hidden features in the surrogate model. We conduct experiments on two public computational fluid dynamics datasets, the Shape-Net Car and Ahmed-Body dataset, using several surrogate models as the backbones with various point cloud geometry encoders to simulate corresponding large-scale Reynolds Average Navier-Stokes equations. Equipped with 3D-GeoCA, these backbone models can reduce the L-2 error by a large margin. Moreover, this 3D-GeoCA is model-agnostic so that it can be applied to any surrogate model.


#5 Linear-Time Optimal Deadlock Detection for Efficient Scheduling in Multi-Track Railway Networks [PDF] [Copy] [Kimi] [REL]

Authors: Hastyn Doshi, Ayush Tripathi, Keshav Agarwal, Harshad Khadilkar, Shivaram Kalyanakrishnan

The railway scheduling problem requires the computation of an operable timetable that satisfies constraints involving railway infrastructure and resource occupancy times, while minimising average delay over a set of events. Since this problem is computationally hard, practical solutions typically roll out feasible (but suboptimal) schedules one step at a time, by choosing which train to move next in every step. The choices made by such algorithms are necessarily myopic, and incur the risk of driving the system to a deadlock. To escape deadlocks, the predominant approach is to stay away from states flagged as potentially unsafe by some fast-to-compute rule R. While many choices of R guarantee deadlock avoidance, they are suboptimal in the sense of also flagging some safe states as unsafe. In this paper, we revisit the literature on process scheduling and describe a rule R0 that is (i) necessary and sufficient for deadlock detection when the network has at least two tracks in each resource (station / track section), (ii) computable in linear time, and (iii) yields lower delays when combined with existing scheduling algorithms on both synthetic and real data sets from Indian Railways.


#6 MMGNN: A Molecular Merged Graph Neural Network for Explainable Solvation Free Energy Prediction [PDF] [Copy] [Kimi] [REL]

Authors: Wenjie Du, Shuai Zhang, Di Wu, Jun Xia, Ziyuan Zhao, Junfeng Fang, Yang Wang

In this paper, we address the challenge of accurately modeling and predicting Gibbs free energy in solute-solvent interactions, a pivotal yet complex aspect in the field of chemical modeling. Traditional approaches, primarily relying on deep learning models, face limitations in capturing the intricate dynamics of these interactions. To overcome these constraints, we introduce a novel framework, molecular modeling graph neural network (MMGNN), which more closely mirrors real-world chemical processes.Specifically, MMGNN explicitly models atomic interactions such as hydrogen bonds by initially forming indiscriminate connections between intermolecular atoms, which are then refined using an attention-based aggregation method, tailoring to specific solute-solvent pairs. To address the challenges of non-interactive or repulsive atomic interactions, MMGNN incorporates interpreters for nodes and edges in the merged graph, enhancing explainability and reducing redundancy. MMGNN stands as the first framework to explicitly align with real chemical processes, providing a more accurate and scientifically sound approach to modeling solute-solvent interactions. The infusion of explainability allows for the extraction of key subgraphs, which are pivotal for further research in solute-solvent dynamics. Extensive experimental validation confirms the efficacy and enhanced explainability of MMGNN.


#7 VF-Detector: Making Multi-Granularity Code Changes on Vulnerability Fix Detector Robust to Mislabeled Changes [PDF] [Copy] [Kimi] [REL]

Authors: Zhenkan Fu, Shikai Guo, Hui Li, Rong Chen, Xiaochen Li, He Jiang

As software development projects increasingly rely on open-source software, users face the risk of security vulnerabilities from third-party libraries. To address label and character noise in code changes, we present VF-Detector to automatically identifying bug-fix commits in actual noise development environment. VF-Detector consists of three componments: Data Pre-processing (DP), Vulnerability Confidence Computation (VCC) and Confidence Learning Denoising (CLD). The DP component is responsible for preprocessing code change data. The VCC component calculates code change confidence value for each bug-fix by extracting features at various granularity levels. The CLD component removes noise and enhances model robustness by pruning noisy data with confidence values and performing effort-aware adjustments. Experimental results demonstrate VF-Detector's superiority over state-of-the-art methods in EffortCost@L and Popt@L metrics on Java and Python datasets. The improvements were 6.5% and 5% for Java, and 23.4% and 17.8% for Python.


#8 Enhancing Fine-Grained Urban Flow Inference via Incremental Neural Operator [PDF1] [Copy] [Kimi] [REL]

Authors: Qiang Gao, Xiaolong Song, Li Huang, Goce Trajcevski, Fan Zhou, Xueqin Chen

Fine-grained urban flow inference (FUFI), which involves inferring fine-grained flow maps from their coarse-grained counterparts, is of tremendous interest in the realm of sustainable urban traffic services. To address the FUFI, existing solutions mainly concentrate on investigating spatial dependencies, introducing external factors, reducing excessive memory costs, etc., -- while rarely considering the catastrophic forgetting (CF) problem. Motivated by recent operator learning, we present an Urban Neural Operator solution with Incremental learning (UNOI), primarily seeking to learn grained-invariant solutions for FUFI in addition to addressing CF. Specifically, we devise an urban neural operator (UNO) in UNOI that learns mappings between approximation spaces by treating the different-grained flows as continuous functions, allowing a more flexible capture of spatial correlations. Furthermore, the phenomenon of CF behind time-related flows could hinder the capture of flow dynamics. Thus, UNOI mitigates CF concerns as well as privacy issues by placing UNO blocks in two incremental settings, i.e., flow-related and task-related. Experimental results on large-scale real-world datasets demonstrate the superiority of our proposed solution against the baselines.


#9 InstructME: An Instruction Guided Music Edit Framework with Latent Diffusion Models [PDF] [Copy] [Kimi1] [REL]

Authors: Bing Han, Junyu Dai, Weituo Hao, Xinyan He, Dong Guo, Jitong Chen, Yuxuan Wang, Yanmin Qian, Xuchen Song

Music editing primarily entails the modification of instrument tracks or remixing in the whole, which offers a novel reinterpretation of the original piece through a series of operations. These music processing methods hold immense potential across various applications but demand substantial expertise. Prior methodologies, although effective for image and audio modifications, falter when directly applied to music. This is attributed to music's distinctive data nature, where such methods can inadvertently compromise the intrinsic harmony and coherence of music. In this paper, we develop InstructME, an Instruction guided Music Editing and remixing framework based on latent diffusion models. Our framework fortifies the U-Net with multi-scale aggregation in order to maintain consistency before and after editing. In addition, we introduce chord progression matrix as condition information and incorporate it in the semantic space to improve melodic harmony while editing. For accommodating extended musical pieces, InstructME employs a chunk transformer, enabling it to discern long-term temporal dependencies within music sequences. We tested InstructME in instrument-editing, remixing, and multi-round editing. Both subjective and objective evaluations indicate that our proposed method significantly surpasses preceding systems in music quality, text relevance and harmony. Demo samples are available at https://musicedit.github.io


#10 Dynamicity-aware Social Bot Detection with Dynamic Graph Transformers [PDF] [Copy] [Kimi] [REL]

Authors: Buyun He, Yingguang Yang, Qi Wu, Hao Liu, Renyu Yang, Hao Peng, Xiang Wang, Yong Liao, Pengyuan Zhou

Detecting social bots has evolved into a pivotal yet intricate task, aimed at combating the dissemination of misinformation and preserving the authenticity of online interactions. While earlier graph-based approaches, which leverage topological structure of social networks, yielded notable outcomes, they overlooked the inherent dynamicity of social networks -- In reality, they largely depicted the social network as a static graph and solely relied on its most recent state. Due to the absence of dynamicity modeling, such approaches are vulnerable to evasion, particularly when advanced social bots interact with other users to camouflage identities and escape detection. To tackle these challenges, we propose BotDGT, a novel framework that not only considers the topological structure, but also effectively incorporates dynamic nature of social network. Specifically, we characterize a social network as a dynamic graph. A structural module is employed to acquire topological information from each historical snapshot. Additionally, a temporal module is proposed to integrate historical context and model the evolving behavior patterns exhibited by social bots and legitimate users. Experimental results demonstrate the superiority of BotDGT against the leading methods that neglected the dynamic nature of social networks in terms of accuracy, recall, and F1-score.


#11 Physics-Informed Neural Networks: Minimizing Residual Loss with Wide Networks and Effective Activations [PDF1] [Copy] [Kimi1] [REL]

Authors: Nima Hosseini Dashtbayaz, Ghazal Farhani, Boyu Wang, Charles X. Ling

The residual loss in Physics-Informed Neural Networks (PINNs) alters the simple recursive relation of layers in a feed-forward neural network by applying a differential operator, resulting in a loss landscape that is inherently different from those of common supervised problems. Therefore, relying on the existing theory leads to unjustified design choices and suboptimal performance. In this work, we analyze the residual loss by studying its characteristics at critical points to find the conditions that result in effective training of PINNs. Specifically, we first show that under certain conditions, the residual loss of PINNs can be globally minimized by a wide neural network. Furthermore, our analysis also reveals that an activation function with well-behaved high-order derivatives plays a crucial role in minimizing the residual loss. In particular, to solve a k-th order PDE, the k-th derivative of the activation function should be bijective. The established theory paves the way for designing and choosing effective activation functions for PINNs and explains why periodic activations have shown promising performance in certain cases. Finally, we verify our findings by conducting a set of experiments on several PDEs. Our code is publicly available at https://github.com/nimahsn/pinns_tf2.


#12 Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces [PDF] [Copy] [Kimi] [REL]

Authors: Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Qian Wang, Zheng Qin, Mike Zheng Shou

Deepfake videos are becoming increasingly realistic, showing few tampering traces on facial areas that vary between frames. Consequently, existing Deepfake detection methods struggle to detect unknown domain Deepfake videos while accurately locating the tampered region. To address this limitation, we propose Delocate, a novel Deepfake detection model that can both recognize and localize unknown domain Deepfake videos. Our method consists of two stages named recovering and localization. In the recovering stage, the model randomly masks regions of interest (ROIs) and reconstructs real faces without tampering traces, leading to a relatively good recovery effect for real faces and a poor recovery effect for fake faces. In the localization stage, the output of the recovery phase and the forgery ground truth mask serve as supervision to guide the forgery localization process. This process strategically emphasizes the recovery phase of fake faces with poor recovery, facilitating the localization of tampered regions. Our extensive experiments on four widely used benchmark datasets demonstrate that Delocate not only excels in localizing tampered areas but also enhances cross-domain detection performance.


#13 Personalized Heart Disease Detection via ECG Digital Twin Generation [PDF] [Copy] [Kimi] [REL]

Authors: Yaojun Hu, Jintai Chen, Lianting Hu, Dantong Li, Jiahuan Yan, Haochao Ying, Huiying Liang, Jian Wu

Heart diseases rank among the leading causes of global mortality, demonstrating a crucial need for early diagnosis and intervention. Most traditional electrocardiogram (ECG) based automated diagnosis methods are trained at population level, neglecting the customization of personalized ECGs to enhance individual healthcare management. A potential solution to address this limitation is to employ digital twins to simulate symptoms of diseases in real patients. In this paper, we present an innovative prospective learning approach for personalized heart disease detection, which generates digital twins of healthy individuals' anomalous ECGs and enhances the model sensitivity to the personalized symptoms. In our approach, a vector quantized feature separator is proposed to locate and isolate the disease symptom and normal segments in ECG signals with ECG report guidance. Thus, the ECG digital twins can simulate specific heart diseases used to train a personalized heart disease detection model. Experiments demonstrate that our approach not only excels in generating high-fidelity ECG signals but also improves personalized heart disease detection. Moreover, our approach ensures robust privacy protection, safeguarding patient data in model development. The code can be found at https://github.com/huyjj/LAVQ-Editor.


#14 ATTA: Adaptive Test-Time Adaptation for Multi-Modal Sleep Stage Classification [PDF1] [Copy] [Kimi2] [REL]

Authors: Ziyu Jia, Xihao Yang, Chenyang Zhou, Haoyang Deng, Tianzi Jiang

Sleep stage classification is crucial for sleep quality assessment and disease diagnosis. Although some recent studies have made great strides in sleep stage classification performance, direct application to multi-modal sleep data with cross-domain distributional variations still poses challenges: 1) How to retain the sleep knowledge acquired by the model from the source domain during cross-domain adaptation to avoid catastrophic forgetting. 2) How to evaluate the contribution of different modalities in identifying specific sleep stages to serve test-time adaptation (TTA). 3) How to dynamically adapt the sleep model to different distribution shift in data domains of different subjects. To address these challenges, we propose an Adaptive Test-Time Adaptation (ATTA) method, a multi-modal test-time adaptation method for sleep stage classification. Specifically, the intra-modal retained-adaptive module is proposed for adapting to the target domain data while retaining the sleep knowledge acquired from the source domain to avoid catastrophic forgetting. The inter-modal contribution assessment module is designed to adaptively assess the contribution of each modality to the identification of specific sleep stages. Furthermore, the adaptive learning rate strategy utilizes a memory bank to record data from different subjects during testing, and based on this, it measures the differences between the target subject and those in the memory bank. According to the difference, the model adapts to the subject samples with different learning rates. We conduct experiments on mutual migration on two sleep datasets, SleepEDF and SHHS. The results show that our ATTA method outperforms state-of-the-art baselines in sleep stage classification.


#15 Vertical Symbolic Regression via Deep Policy Gradient [PDF] [Copy] [Kimi] [REL]

Authors: Nan Jiang, Md Nasim, Yexiang Xue

Vertical Symbolic Regression (VSR) has recently been proposed to expedite the discovery of symbolic equations with many independent variables from experimental data. VSR reduces the search spaces following the vertical discovery path by building from reduced-form equations involving a subset of variables to all variables. While deep neural networks have shown promise in enhancing symbolic regression, directly integrating VSR with deep networks faces challenges such as gradient propagation and engineering complexities due to the tree representation of expressions. We propose Vertical Symbolic Regression using Deep Policy Gradient (VSR-DPG) and demonstrate that VSR-DPG can recover ground-truth equations involving multiple input variables, significantly beyond both deep reinforcement learning-based approaches and previous VSR variants. Our VSR-DPG models symbolic regression as a sequential decision-making process, in which equations are built from repeated applications of grammar rules. The integrated deep model is trained to maximize a policy gradient objective. Experimental results demonstrate that our VSR-DPG significantly outperforms popular baselines in identifying both algebraic equations and ordinary differential equations on a series of benchmarks.


#16 Privacy-Preserving UCB Decision Process Verification via zk-SNARKs [PDF] [Copy] [Kimi] [REL]

Authors: Xikun Jiang, He Lyu, Chenhao Ying, Yibin Xu, Boris Düdder, Yuan Luo

With the increasingly widespread application of machine learning, how to strike a balance between protecting the privacy of data and algorithm parameters and ensuring the verifiability of machine learning has always been a challenge. This study explores the intersection of reinforcement learning and data privacy, specifically addressing the Multi-Armed Bandit (MAB) problem with the Upper Confidence Bound (UCB) algorithm. We introduce zkUCB, an innovative algorithm that employs the Zero-Knowledge Succinct Non-Interactive Argument of Knowledge (zk-SNARKs) to enhance UCB. zkUCB is carefully designed to safeguard the confidentiality of training data and algorithmic parameters, ensuring transparent UCB decision-making. Experiments highlight zkUCB's superior performance, attributing its enhanced reward to judicious quantization bit usage that reduces information entropy in the decision-making process. zkUCB's proof size and verification time scale linearly with the execution steps of zkUCB. This showcases zkUCB's adept balance between data security and operational efficiency. This approach contributes significantly to the ongoing discourse on reinforcing data privacy in complex decision-making processes, offering a promising solution for privacy-sensitive applications.


#17 Stochastic Neural Simulator for Generalizing Dynamical Systems across Environments [PDF] [Copy] [Kimi] [REL]

Authors: Liu Jiaqi, Jiaxu Cui, Jiayi Yang, Bo Yang

Neural simulators for modeling complex dynamical systems have been extensively studied for various real-world applications, such as weather forecasting, ocean current prediction, and computational fluid dynamics simulation. Although they have demonstrated powerful fitting and predicting, most existing models are only built to learn single-system dynamics. Several advanced researches have considered learning dynamics across environments, which can exploit the potential commonalities among the dynamics across environments and adapt to new environments. However, these methods still are prone to scarcity problems where per-environment data is sparse or limited. Therefore, we propose a novel CoNDP (Context-Informed Neural ODE Processes) to achieve learning system dynamics from sparse observations across environments. It can fully use contextual information of each environment to better capture the intrinsic commonalities across environments and distinguishable differences among environments while modeling uncertainty of system evolution, producing more accurate predictions. Intensive experiments are conducted on five complex dynamical systems in various fields. Results show that the proposed CoNDP can achieve optimal results compared with common neural simulators and state-of-the-art cross-environmental models.


#18 Enhancing Length Generalization for Attention Based Knowledge Tracing Models with Linear Biases [PDF1] [Copy] [Kimi1] [REL]

Authors: Xueyi Li, Youheng Bai, Teng Guo, Zitao Liu, Yaying Huang, Xiangyu Zhao, Feng Xia, Weiqi Luo, Jian Weng

Knowledge tracing (KT) is the task of predicting students' future performance based on their historical learning interaction data. With the rapid advancement of attention mechanisms, many attention based KT models are developed. However, existing attention based KT models exhibit performance drops as the number of student interactions increases beyond the number of interactions on which the KT models are trained. We refer to this as the length generalization of KT model. In this paper, we propose stableKT to enhance length generalization that is able to learn from short sequences and maintain high prediction performance when generalizing on long sequences. Furthermore, we design a multi-head aggregation module to capture the complex relationships between questions and the corresponding knowledge components (KCs) by combining dot-product attention and hyperbolic attention. Experimental results on three public educational datasets show that our model exhibits robust capability of length generalization and outperforms all baseline models in terms of AUC. To encourage reproducible research, we make our data and code publicly available at https://pykt.org.


#19 OUCopula: Bi-Channel Multi-Label Copula-Enhanced Adapter-Based CNN for Myopia Screening Based on OU-UWF Images [PDF] [Copy] [Kimi] [REL]

Authors: Yang Li, Qiuyi Huang, Chong Zhong, Danjuan Yang, Meiyan Li, A.H. Welsh, Aiyi Liu, Bo Fu, Catherine C. Liu, Xingtao Zhou

Myopia screening using cutting-edge ultra-widefield (UWF) fundus imaging is potentially significant for ophthalmic outcomes. Current multidisciplinary research between ophthalmology and deep learning (DL) concentrates primarily on disease classification and diagnosis using single-eye images, largely ignoring joint modeling and prediction for Oculus Uterque (OU, both eyes). Inspired by the complex relationships between OU and the high correlation between the (continuous) outcome labels (Spherical Equivalent and Axial Length), we propose a framework of copula-enhanced adapter convolutional neural network (CNN) learning with OU UWF fundus images (OUCopula) for joint prediction of multiple clinical scores. We design a novel bi-channel multi-label CNN which can (1) take bi-channel image inputs subject to both high correlation and heterogeneity (by sharing the same backbone network and employing adapters to parameterize the channel-wise discrepancy), and (2) incorporate correlation information between continuous output labels (using a copula). Solid experiments show that OUCopula achieves satisfactory performance in myopia score prediction compared to backbone models. Moreover, OUCopula can far exceed the performance of models constructed for single-eye inputs. Importantly, our study also hints at the potential extension of the bi-channel model to a multi-channel paradigm and the generalizability of OUCopula across various backbone CNNs. The code and the supplementary materials are available at: github.com/Charley-HUANG/OUCopula.


#20 A Cognitive-Driven Trajectory Prediction Model for Autonomous Driving in Mixed Autonomy Environments [PDF] [Copy] [Kimi] [REL]

Authors: Haicheng Liao, Zhenning Li, Chengyue Wang, Bonan Wang, Hanlin Kong, Yanchen Guan, Guofa Li, Zhiyong Cui

As autonomous driving technology progresses, the need for precise trajectory prediction models becomes paramount. This paper introduces an innovative model that infuses cognitive insights into trajectory prediction, focusing on perceived safety and dynamic decision-making. Distinct from traditional approaches, our model excels in analyzing interactions and behavior patterns in mixed autonomy traffic scenarios. We introduce the Macao Connected Autonomous Driving (MoCAD) dataset as part of our contributions, which adds value to its complex urban driving scenarios. Our model represents a significant leap forward, achieving marked performance improvements on several key datasets. Specifically, it surpasses existing benchmarks with gains of 16.2% on the Next Generation Simulation (NGSIM), 27.4% on the Highway Drone (HighD), and 19.8% on the MoCAD dataset. Our proposed model shows exceptional proficiency in handling corner cases, essential for real-world applications. Moreover, its robustness is evident in scenarios with missing or limited data, outperforming most of the state-of-the-art baselines. This adaptability and resilience position our model as a viable tool for real-world autonomous driving systems, heralding a new standard in vehicle trajectory prediction for enhanced safety and efficiency.


#21 MFTraj: Map-Free, Behavior-Driven Trajectory Prediction for Autonomous Driving [PDF] [Copy] [Kimi] [REL]

Authors: Haicheng Liao, Zhenning Li, Chengyue Wang, Huanming Shen, Dongping Liao, Bonan Wang, Guofa Li, Chengzhong Xu

This paper introduces a trajectory prediction model tailored for autonomous driving, focusing on capturing complex interactions in dynamic traffic scenarios without reliance on high-definition maps. The model, termed MFTraj, harnesses historical trajectory data combined with a novel dynamic geometric graph-based behavior-aware module. At its core, an adaptive structure-aware interactive graph convolutional network captures both positional and behavioral features of road users, preserving spatial-temporal intricacies. Enhanced by a linear attention mechanism, the model achieves computational efficiency and reduced parameter overhead. Evaluations on the Argoverse, NGSIM, HighD, and MoCAD datasets underscore MFTraj's robustness and adaptability, outperforming numerous benchmarks even in data-challenged scenarios without the need for additional information such as HD maps or vectorized maps. Importantly, it maintains competitive performance even in scenarios with substantial missing data (12.5%-50%), outperforming most existing state-of-the-art models. The results and methodology suggest a significant advancement in autonomous driving trajectory prediction, paving the way for safer and efficient autonomous systems.


#22 SCTrans: Multi-scale scRNA-seq Sub-vector Completion Transformer for Gene-selective Cell Type Annotation [PDF] [Copy] [Kimi] [REL]

Authors: Lu Lin, Wen Xue, Xindian Wei, Wenjun Shen, Cheng Liu, Si Wu, Hau San Wong

Cell type annotation is pivotal to single-cell RNA sequencing data (scRNA-seq)-based biological and medical analysis, e.g., identifying biomarkers, exploring cellular heterogeneity, and understanding disease mechanisms. The previous annotation methods typically learn a nonlinear mapping to infer cell type from gene expression vectors, and thus fall short in discovering and associating salient genes with specific cell types. To address this issue, we propose a multi-scale scRNA-seq Sub-vector Completion Transformer, and our model is referred to as SCTrans. Considering that the expressiveness of gene sub-vectors is richer than that of individual genes, we perform multi-scale partitioning on gene vectors followed by masked sub-vector completion, conditioned on unmasked ones. Toward this end, the multi-scale sub-vectors are tokenized, and the intrinsic contextual relationships are modeled via self-attention computation and conditional contrastive regularization imposed on an encoding transformer. By performing mutual learning between the encoder and an additional lightweight counterpart, the salient tokens can be distinguished from the others. As a result, we can perform gene-selective cell type annotation, which contributes to our superior performance over state-of-the-art annotation methods.


#23 Enhancing Multimodal Knowledge Graph Representation Learning through Triple Contrastive Learning [PDF1] [Copy] [Kimi1] [REL]

Authors: Yuxing Lu, Weichen Zhao, Nan Sun, Jinzhuo Wang

Multimodal knowledge graphs incorporate multimodal information rather than pure symbols, which significantly enhance the representation of knowledge graphs and their capacity to understand the world. Despite these advancements, existing multimodal fusion techniques still face significant challenges in representing modalities and fully integrating the diverse attributes of entities, particularly when dealing with more than one modality. To address this issue, this article proposes a Knowledge Graph Multimodal Representation Learning (KG-MRI) method. This method utilizes foundation models to represent different modalities and incorporates a triple contrastive learning model and a dual-phase training strategy to effectively fuse the different modalities with knowledge graph embeddings. We conducted comprehensive comparisons with several different knowledge graph embedding methods to validate the effectiveness of our KG-MRI model. Furthermore validation on a real-world Non-Alcohol Fatty Liver Disease (NAFLD) cohort demonstrated that the vector representations learned through our methodology possess enhanced representational capabilities, showing promise for broader applications in complex multimodal environments.


#24 Enhanced DouDiZhu Card Game Strategy Using Oracle Guiding and Adaptive Deep Monte Carlo Method [PDF1] [Copy] [Kimi] [REL]

Authors: Qian Luo, Tien Ping Tan, Daochen Zha, Tianqiao Zhang

Deep Reinforcement Learning (DRL) exhibits significant advancements in games with both perfect and imperfect information, such as Go, Chess, Texas Hold'em, and Dota2. However, DRL encounters considerable challenges when tackling card game DouDiZhu because of the imperfect information, large state-action space, and the sparse reward issue. This paper presents OADMCDou, which combines Oracle Guiding and Adaptive Deep Monte Carlo Method to address the challenges in DouDiZhu. Oracle Guiding trains an Oracle agent with both imperfect and perfect information, gradually reducing reliance on imperfect information to transition to a standard agent. Adaptive Deep Monte Carlo uses gradient weight clipping and constrains the magnitude of updates to prevent extreme policy updates. We conduct extensive experiments to evaluate the effectiveness of the proposed methods, demonstrating OADMCDou's superior performance over the state-of-the-art DouDiZhu AI, DouZero. This superiority over DouZero is reflected in two metrics: a 95% confidence interval of 0.104 ± 0.041 for performance, and a 28.6% reduction in loss.


#25 Towards Geometric Normalization Techniques in SE(3) Equivariant Graph Neural Networks for Physical Dynamics Simulations [PDF1] [Copy] [Kimi] [REL]

Authors: Ziqiao Meng, Liang Zeng, Zixing Song, Tingyang Xu, Peilin Zhao, Irwin King

SE(3) equivariance is a fundamental property that is highly desirable to maintain in physical dynamics modeling. This property ensures neural outputs to remain robust when the inputs are translated or rotated. Recently, there have been several proposals for SE(3) equivariant graph neural networks (GNNs) that have shown promising results in simulating particle dynamics. However, existing works have neglected an important issue that current SE(3) equivariant GNNs cannot scale to large particle systems. Although some simple normalization techniques are already in use to stabilize the training dynamics of equivariant graph networks, they actually break the SE(3) equivariance of the architectures. In this work, we first show the numerical instability of training equivariant GNNs on large particle systems and then analyze some existing normalization strategies adopted in modern works. We propose a new normalization layer called GeoNorm, which can satisfy the SE(3) equivariance and simultaneously stabilize the training process. We conduct comprehensive experiments on N-body system simulation tasks with larger particle system sizes. The experimental results demonstrate that GeoNorm successfully preserves the SE(3) equivariance compared to baseline techniques and stabilizes the training dynamics of SE(3) equivariant GNNs on large systems.