AAAI.2026 - Machine Learning | Cool Papers

#1 FedGRPO: Privately Optimizing Foundation Models with Group-Relative Rewards from Domain Clients [PDF] [Copy] [Kimi] [REL]

Authors: Gongxi Zhu, Hanlin Gu, Lixin Fan, Qiang Yang, Yuxing Han

One important direction of Federated Foundation Models (FedFMs) is leveraging data from small client models to enhance the performance of a large server‑side foundation model. Existing methods based on model level or representation level knowledge transfer either require expensive local training or incur high communication costs and introduce unavoidable privacy risks. We reformulate this problem as a reinforcement learning style evaluation process and propose FedGRPO, a privacy preserving framework comprising two modules. The first module performs competence-based expert selection by building a lightweight confidence graph from auxiliary data to identify the most suitable clients for each question. The second module leverages the “Group Relative” concept from the Group Relative Policy Optimization (GRPO) framework by packaging each question together with its solution rationale into candidate policies, dispatching these policies to a selected subset of expert clients, and aggregating solely the resulting scalar reward signals via a federated group–relative loss function. By exchanging reward values instead of data or model updates, FedGRPO reduces privacy risk and communication overhead while enabling parallel evaluation across heterogeneous devices. Empirical results on diverse domain tasks demonstrate that FedGRPO achieves superior downstream accuracy and communication efficiency compared to conventional FedFMs baselines.

Subject: AAAI.2026 - Machine Learning

#2 EfficientFlow: Efficient Equivariant Flow Policy Learning for Embodied AI [PDF] [Copy] [Kimi] [REL]

Authors: Jianlei Chang, Ruofeng Mei, Wei Ke, Xiangyu Xu

Generative modeling has recently shown remarkable promise for visuomotor policy learning, enabling flexible and expressive control across diverse embodied AI tasks. However, existing generative policies often struggle with data inefficiency, requiring large-scale demonstrations, and sampling inefficiency, incurring slow action generation during inference. We introduce EfficientFlow, a unified framework for efficient embodied AI with flow-based policy learning. To enhance data efficiency, we bring equivariance into flow matching. We theoretically prove that when using an isotropic Gaussian prior and an equivariant velocity prediction network, the resulting action distribution remains equivariant, leading to improved generalization and substantially reduced data demands. To accelerate sampling, we propose a novel acceleration regularization strategy. As direct computation of acceleration is intractable for marginal flow trajectories, we derive a novel surrogate loss that enables stable and scalable training using only conditional trajectories. Across a wide range of robotic manipulation benchmarks, the proposed algorithm achieves competitive or superior performance under limited data while offering dramatically faster inference. These results highlight EfficientFlow as a powerful and efficient paradigm for high-performance embodied AI.

Subject: AAAI.2026 - Machine Learning

#3 Automatic Channel Pruning by Searching with Structure Embedding for Hash Network [PDF] [Copy] [Kimi] [REL]

Authors: Zifan Liu, Yuan Cao, Yifan Sun, Yanwei Yu, Heng Qi

Deep hash networks are widely used in tasks such as large-scale image retrieval due to high search efficiency and low storage costs through binary hash codes. With the growing demand for deploying deep hash networks on resource-constrained devices, it is crucial to perform network compression on them, in which automatic pruning constitutes a priority option owing to efficacy maintenance. However, existing pruning methods are mostly designed for image classification, while hashing networks must generate compact binary codes, making each channel more sensitive to retrieval objectives. As a result, their performance often degrades when applied to image retrieval tasks. In this paper, we propose a novel Automatic Channel Pruning framework by Searching with Structure Embedding (ACP-SSE). To the best of our knowledge, this is the first study to explore pruning techniques for deep hash networks and the first automatic pruning method by searching based on network topology structure. Specifically, we first design a structure encoding model by Graph Convolutional Networks (GCNs) whose graph is constructed by hash network and nodes' features are initialized by pruning strategies. The model is trained by contrastive learning loss efficiently without accuracy supervision by fine-tuning pruned models. In addition, we introduce a dynamic pruning search space in consideration of the resource constraints. By converting the automatic channel pruning task into searching the pruned structure with effect similar to the unpruned structure, it enables the method to adapt to various network architectures. Finally, the optimal networks are selected from the candidate set according to their performance in specific downstream tasks. Extensive experiments demonstrate that ACP-SSE indeed works in the automatic channel pruning area, outperforming state-of-the-art baselines in hashing-based image retrieval, while maintaining competitive accuracy in image classification.

Subject: AAAI.2026 - Machine Learning

#4 DOS: Distilling Observable Softmaps of Zipfian Prototypes for Self-Supervised Point Representation [PDF] [Copy] [Kimi] [REL]

Authors: Mohamed Abdelsamad, Michael Ulrich, Bin Yang, Miao Zhang, Yakov Miron, Abhinav Valada

Recent advances in self-supervised learning (SSL) have shown tremendous potential for learning 3D point cloud representations without human annotations. However, SSL for 3D point clouds still faces critical challenges due to irregular geometry, shortcut-prone reconstruction, and unbalanced semantics distribution. In this work, we propose DOS (Distilling Observable Softmaps), a novel SSL framework that self-distills semantic relevance softmaps only at observable (unmasked) points. This strategy prevents information leakage from masked regions and provides richer supervision than discrete token-to-prototype assignments. To address the challenge of unbalanced semantics in an unsupervised setting, we introduce Zipfian prototypes and incorporate them using a modified Sinkhorn-Knopp algorithm, Zipf-Sinkhorn, which enforces a power-law prior over prototype usage and modulates the sharpness of the target softmap during training. DOS outperforms current state-of-the-art methods on semantic segmentation and 3D object detection across multiple benchmarks, including nuScenes, Waymo, SemanticKITTI, ScanNet, and ScanNet200, without relying on extra data or annotations. Our results demonstrate that observable-point softmaps distillation offers a scalable and effective paradigm for learning robust 3D representations.

Subject: AAAI.2026 - Machine Learning

#5 Constrained Online Convex Optimization with Memory and Predictions [PDF] [Copy] [Kimi] [REL]

Authors: Mohammed Abdullah, George Iosifidis, Salah Eddine Elayoubi, Tijani Chahed

We study Constrained Online Convex Optimization with Memory (COCO-M), where both the loss and the constraints depend on a finite window of past decisions made by the learner. This setting extends the previously studied unconstrained online optimization with memory framework and captures practical problems such as the control of constrained dynamical systems and scheduling with reconfiguration budgets. For this problem, we propose the first algorithms that achieve sublinear regret and sublinear cumulative constraint violation under time-varying constraints, both with and without predictions of future loss and constraint functions. Without predictions, we introduce an adaptive penalty approach that guarantees sublinear regret and constraint violation. When short-horizon and potentially unreliable predictions are available, we reinterpret the problem as online learning with delayed feedback and design an optimistic algorithm whose performance improves as prediction accuracy improves, while remaining robust when predictions are inaccurate. Our results bridge the gap between classical constrained online convex optimization and memory-dependent settings, and provide a versatile learning toolbox with diverse applications.

Subject: AAAI.2026 - Machine Learning

#6 Expressive Temporal Specifications for Reward Monitoring [PDF] [Copy] [Kimi] [REL]

Authors: Omar Adalat, Francesco Belardinelli

Specifying informative and dense reward functions remains a pivotal challenge in Reinforcement Learning, as it directly affects the efficiency of agent training. In this work, we harness the expressive power of quantitative Linear Temporal Logic on finite traces to synthesize reward monitors that generate a dense stream of rewards for runtime-observable state trajectories. By providing nuanced feedback during training, these monitors guide agents toward optimal behaviour and help mitigate the well-known issue of sparse rewards under long-horizon decision making, which arises under the Boolean semantics dominating the current literature. Our framework is algorithm-agnostic and only relies on a state labelling function, and naturally accommodates specifying non-Markovian properties. Empirical results show that our quantitative monitors consistently subsume and, depending on the environment, outperform Boolean monitors in maximizing a quantitative measure of task completion and in reducing convergence time.

Subject: AAAI.2026 - Machine Learning

#7 ProbLog4Fairness: A Neurosymbolic Approach to Modeling and Mitigating Bias [PDF] [Copy] [Kimi] [REL]

Authors: Rik Adriaensen, Lucas Van Praet, Jessa Bekker, Robin Manhaeve, Pieter Delobelle, Maarten Buyl

Operationalizing definitions of fairness is difficult in practice, as multiple definitions can be incompatible while each being arguably desirable. Instead, it may be easier to directly describe algorithmic bias through ad-hoc assumptions specific to a particular real-world task, e.g., based on background information on systemic biases in its context. Such assumptions can, in turn, be used to mitigate this bias during training. Yet, a framework for incorporating such assumptions that is simultaneously principled, flexible, and interpretable is currently lacking. Our approach is to formalize bias assumptions as programs in ProbLog, a probabilistic logic programming language that allows for the description of probabilistic causal relationships through logic. Neurosymbolic extensions of ProbLog then allow for easy integration of these assumptions in a neural network's training process. We propose a set of templates to express different types of bias and show the versatility of our approach on synthetic tabular datasets with known biases. Using estimates of the bias distortions present, we also succeed in mitigating algorithmic bias in real-world tabular and image data. We conclude that ProbLog4Fairness outperforms baselines due to its ability to flexibly model the relevant bias assumptions, where other methods typically uphold a fixed bias type or notion of fairness.

Subject: AAAI.2026 - Machine Learning

#8 RefiDiff: Progressive Refinement Diffusion for Efficient Missing Data Imputation [PDF] [Copy] [Kimi] [REL]

Authors: Md Atik Ahamed, Qiang Ye, Qiang Cheng

Missing values in high-dimensional, mixed-type datasets pose significant challenges for data imputation, particularly under Missing Not At Random (MNAR) mechanisms. Existing methods struggle to integrate local and global data characteristics, limiting performance in MNAR and high-dimensional settings. We propose an innovative framework, RefiDiff, combining local machine learning predictions with a novel Mamba-based denoising network efficiently capturing long-range dependencies among features and samples with low computational complexity. RefiDiff bridges the predictive and generative paradigms of imputation, leveraging pre-refinement for initial warm-up imputations and post-refinement to polish results, enhancing stability and accuracy. By encoding mixed-type data into unified tokens, RefiDiff enables robust imputation without architectural or hyperparameter tuning. RefiDiff outperforms state-of-the-art (SOTA) methods across missing-value settings, demonstrating strong performance in MNAR settings and superior out-of-sample generalization. Extensive evaluations on nine real-world datasets demonstrate its robustness, scalability, and effectiveness in handling complex missingness patterns.

Subject: AAAI.2026 - Machine Learning

#9 Stabilizing Policy Gradient Methods via Reward Profiling [PDF] [Copy] [Kimi] [REL]

Authors: Shihab Ahmed, El Houcine Bergou, Yue Wang, Aritra Dutta

Policy gradient methods, which have been extensively studied in the last decade, offer an effective and efficient framework for reinforcement learning problems. However, their performances can often be unsatisfactory, suffering from unreliable reward improvements and slow convergence, due to high variance in gradient estimations. In this paper, we propose a universal reward profiling framework that can be seamlessly integrated with any policy gradient algorithm, where we selectively update the policy based on high-confidence performance estimations. We theoretically justify that our technique will not slow down the convergence of the baseline policy gradient methods, but with high probability, will result in stable and monotonic improvements of their performance. Empirically, on eight continuous‐control benchmarks (Box2D and MuJoCo/PyBullet), our profiling yields up to 1.5x faster convergence to near‐optimal returns, up to 1.75x reduction in return variance on some setups. Our profiling approach offers a general, theoretically grounded path to more reliable and efficient policy learning in complex environments.

Subject: AAAI.2026 - Machine Learning

#10 Expressive Power of Graph Transformers via Logic [PDF] [Copy] [Kimi] [REL]

Authors: Veeti Ahvonen, Maurice Funk, Damian Heiman, Antti Kuusisto, Carsten Lutz

Transformers are the basis of modern large language models, but relatively little is known about their precise expressive power on graphs. We study the expressive power of graph transformers (GTs) by Dwivedi and Bresson (2020) and GPS-networks by Rampásek et al. (2022), both under soft-attention and average hard-attention. Our study covers two scenarios: the theoretical setting with real numbers and the more practical case with floats. With reals, we show that in restriction to vertex properties definable in first-order logic (FO), GPS-networks have the same expressive power as graded modal logic (GML) with the global modality. With floats, GPS-networks turn out to be equally expressive as GML with the counting global modality. The latter result is absolute, not restricting to properties definable in a background logic. We also obtain similar characterizations for GTs in terms of propositional logic with the global modality (for reals) and the counting global modality (for floats).

Subject: AAAI.2026 - Machine Learning

#11 PharmaQA: Prompt-Based Molecular Representation Learning via Pharmacophore-Oriented Question Answering [PDF] [Copy] [Kimi] [REL]

Authors: Chengwei Ai, Qiaozhen Meng, Mengwei Sun, Ruihan Dong, Hongpeng Yang, Shiqiang Ma, Xiaoyi Liu, Cheng Liang, Fei Guo

Molecular representation plays a central role in computational drug discovery. Pharmacophores, functional groups responsible for molecular bioactivity, have been widely studied in cheminformatics. However, their incorporation into molecular representation learning, particularly in a context reasoning or generalization, remains relatively limited. To address this gap, we propose PharmaQA, a pharmacophore oriented question answering framework that formulates tailored prompts to extract context-aware molecular semantics. Rather than encoding pharmacophore features, PharmaQA learns to answer pharmacophore related queries. This design enables flexible reasoning across diverse tasks, including molecular property prediction, compound-target interaction prediction, and binding affinity estimation. Experimental results on benchmark datasets demonstrate that PharmaQA achieves competitive performance. In a ligand discovery case study using FDA-approved compounds, the framework identified potential inhibitors for three therapeutic targets, with strong docking performance. As a generalizable and modular solution, PharmaQA incorporates pharmacophoric knowledge into molecular embeddings, enhancing both predictive accuracy and interpretability in drug discovery applications.

Subject: AAAI.2026 - Machine Learning

#12 PAGE: A Unified Approach for Federated Graph Unlearning [PDF] [Copy] [Kimi] [REL]

Authors: Yuming Ai, Xunkai Li, Jiaqi Chao, Bowen Fan, Zhengyu Wu, Yinlin Zhu, Rong-Hua Li, Guoren Wang

Federated graph learning (FGL) is a distributive framework for graph representation learning that prioritizes privacy preservation. The right to be forgotten embodies the ethical principle of prioritizing user autonomy over data usage. In the context of FGL, upholding this right requires the method to remove specific entities and their associated knowledge within local subgraphs (Meta Unlearning) and the complete erasure of the entire client (Client Unlearning). We are the first to systematically define the above two unlearn requests in federated graph unlearning. Several studies have attempted to address this challenge, but key limitations persist: incomplete unlearning support and residual knowledge permeation. To this end, we propose a Prototype-guided Adversarial Graph Eraser for universal federated graph unlearning (PAGE), the first unified federated graph unlearning framework that extend to comprehensive unlearning requests. For meta unlearning, we employ the prototype gradients guide initial local unlearn, while adversarial graphs eliminate residual knowledge across the influenced clients. For client unlearning, PAGE exclusively utilizes adversarial graph generation to purge a departed client's influence from the remaining participants. PAGE outperforms existing methods on 8 benchmark datasets. It improves prediction accuracy by 5.08% (client unlearn) and 1.50% (meta-unlearn), with up to 11.84% gain on large-scale graphs. Furthermore, ablation studies confirm its efficacy as a plug-in for other meta unlearn methods, boosting prediction performance up to 4.49% and unlearning performance up to 7.22%.

Subject: AAAI.2026 - Machine Learning

#13 InfoQ: Mixed-Precision Quantization via Global Information Flow [PDF] [Copy] [Kimi] [REL]

Authors: Mehmet Emre Akbulut, Hazem Hesham Yousef Shalby, Fabrizio Pittorino, Manuel Roveri

Mixed-precision quantization (MPQ) is crucial for deploying deep neural networks on resource-constrained devices, but finding the optimal bit-width for each layer represents a complex combinatorial optimization problem. Current state-of-the-art methods rely on computationally expensive search algorithms or local sensitivity heuristic proxies like the Hessian, which fail to capture the cascading global effects of quantization error. In this work, we argue that the quantization sensitivity of a layer should not be measured by its local properties, but by its impact on the information flow throughout the entire network. We introduce InfoQ, a novel framework for mixed-precision quantization that is training-free in the bit-width search phase. InfoQ assesses layer importance by performing a single forward pass to measure the change in mutual information in the remaining part of the network, thus creating a global sensitivity score. This approach directly quantifies how quantizing one layer degrades the information characteristics of subsequent layers. The resulting scores are used to formulate bit-width allocation as an integer linear programming problem, which is solved efficiently to minimize total sensitivity under a given budget (e.g., model size or BitOps). Our retraining-free search phase provides a superior search-time/accuracy trade-off (using two orders of magnitude less data compared to state-of-the-art methods such as LIMPQ), while yielding up to a 1% accuracy improvement for MobileNetV2 and ResNet18 on ImageNet at high compression rates (14.00x and 10.66x).

Subject: AAAI.2026 - Machine Learning

#14 Symmetric Aggregation of Conformity Scores for Efficient Uncertainty Sets [PDF] [Copy] [Kimi] [REL]

Authors: Nabil Alami, Jad Zakharia, Souhaib Ben Taieb

Access to multiple predictive models trained for the same task, whether in regression or classification, is increasingly common in many applications. Aggregating their predictive uncertainties to produce reliable and efficient uncertainty quantification is therefore a critical but still underexplored challenge, especially within the framework of conformal prediction (CP). While CP methods can generate individual prediction sets from each model, combining them into a single, more informative set remains a challenging problem. To address this, we propose SACP (Symmetric Aggregated Conformal Prediction), a novel method that aggregates nonconformity scores from multiple predictors. SACP transforms these scores into e-values and combines them using any symmetric aggregation function. This flexible design enables a robust, data-driven framework for selecting aggregation strategies that yield sharper prediction sets. We also provide theoretical insights that help justify the validity and performance of the SACP approach. Extensive experiments on diverse datasets show that SACP consistently improves efficiency and often outperforms state-of-the-art model aggregation baselines.

Subject: AAAI.2026 - Machine Learning

#15 Removing Box-Free Watermarks for Image-to-Image Models via Query-Based Reverse Engineering [PDF] [Copy] [Kimi] [REL]

Authors: Haonan An, Guang Hua, Hangcheng Cao, Zhengru Fang, Guowen Xu, Susanto Rahardja, Yuguang Fang

The intellectual property of deep generative networks (GNets) can be protected using a cascaded hiding network (HNet) which embeds watermarks (or marks) into GNet outputs, known as box-free watermarking. Although both GNet and HNet are encapsulated in a black box (called operation network, or ONet), with only the generated and marked outputs from HNet being released to end users and deemed secure, in this paper, we reveal an overlooked vulnerability in such systems. Specifically, we show that the hidden GNet outputs can still be reliably estimated via query-based reverse engineering, leaking the generated and unmarked images, despite the attacker's limited knowledge of the system. Our first attempt is to reverse-engineer an inverse model for HNet under the stringent black-box condition, for which we propose to exploit the query process with specially curated input images. While effective, this method yields unsatisfactory image quality. To improve this, we subsequently propose an alternative method leveraging the equivalent additive property of box-free model watermarking and reverse-engineering a forward surrogate model of HNet, with better image quality preservation. Extensive experimental results on image processing and image generation tasks demonstrate that both attacks achieve impressive watermark removal success rates (100%) while also maintaining excellent image quality (reaching the highest PSNR of 34.69 dB), substantially outperforming existing attacks, highlighting the urgent need for robust defensive strategies to mitigate the identified vulnerability in box-free model watermarking.

Subject: AAAI.2026 - Machine Learning

#16 FreDN: Spectral Disentanglement for Time Series Forecasting via Learnable Frequency Decomposition [PDF] [Copy] [Kimi] [REL]

Authors: Zhongde An, Jinhong You, Jiyanglin Li, Yiming Tang, Wen Li, Heming Du, Shouguo Du

Time series forecasting is essential in a wide range of real world applications. Recently, frequency-domain methods have attracted increasing interest for their ability to capture global dependencies. However, when applied to non-stationary time series, these methods encounter the spectral entanglement and the computational burden of complex-valued learning. The spectral entanglement refers to the overlap of trends, periodicities, and noise across the spectrum due to spectral leakage and the presence of non-stationarity. However, existing decompositions are not suited to resolving spectral entanglement. To address this, we propose the Frequency Decomposition Network (FreDN), which introduces a learnable Frequency Disentangler module to separate trend and periodic components directly in the frequency domain. Furthermore, we propose a theoretically supported ReIm Block to reduce the complexity of complex-valued operations while maintaining performance. We also re-examine the frequency-domain loss function and provide new theoretical insights into its effectiveness. Extensive experiments on seven long-term forecasting benchmarks demonstrate that FreDN outperforms state-of-the-art methods by up to 10%. Furthermore, compared with standard complex-valued architectures, our real-imaginary shared-parameter design reduces the parameter count and computational cost by at least 50%.

Subject: AAAI.2026 - Machine Learning

#17 SOSControl: Enhancing Human Motion Generation Through Saliency-Aware Symbolic Orientation and Timing Control [PDF] [Copy] [Kimi] [REL]

Authors: Ho Yin Au, Junkun Jiang, Jie Chen

Traditional text-to-motion frameworks often lack precise control, and existing approaches based on joint keyframe locations provide only positional guidance, making it challenging and unintuitive to specify body part orientations and motion timing. To address these limitations, we introduce the Salient Orientation Symbolic (SOS) script, a programmable symbolic framework for specifying body part orientations and motion timing at keyframes. We further propose an automatic SOS extraction pipeline that employs temporally-constrained agglomerative clustering for frame saliency detection and a Saliency-based Masking Scheme (SMS) to generate sparse, interpretable SOS scripts directly from motion data. Moreover, we present the SOSControl framework, which treats the available orientation symbols in the sparse SOS script as salient and prioritizes satisfying these constraints during motion generation. By incorporating SMS-based data augmentation and gradient-based iterative optimization, the framework enhances alignment with user-specified constraints. Additionally, it employs a ControlNet-based ACTOR-PAE Decoder to ensure smooth and natural motion outputs. Extensive experiments demonstrate that the SOS extraction pipeline generates human-interpretable scripts with symbolic annotations at salient keyframes, while the SOSControl framework outperforms existing baselines in motion quality, controllability, and generalizability with respect to motion timing and body part orientation control.

Subject: AAAI.2026 - Machine Learning

#18 Spectral Basis Learning for Expressive Graph Neural Networks in Link Prediction [PDF] [Copy] [Kimi] [REL]

Authors: Niloofar Azizi, Nils M. Kriege, Nicholas J. A. Harvey, Horst Bischof

Graph Neural Networks (GNNs) excel in handling graph-structured data but often underperform in link prediction tasks compared to classical methods, mainly due to the limitations of the commonly used message-passing principle. Notably, their ability to distinguish non-isomorphic graphs is limited by the 1-dimensional Weisfeiler-Lehman test (WL). Our study presents a novel method to enhance the expressivity of GNNs by embedding induced subgraphs into the eigenbasis of the graph Laplacian. We introduce a Learnable Lanczos algorithm with Linear Constraints (LLwLC), proposing two novel subgraph extraction strategies: encoding vertex-deleted subgraphs and applying Neumann eigenvalue constraints. For the former, we demonstrate the ability to distinguish graphs that are indistinguishable by 2-WL, while maintaining efficiency. The latter focuses on link representations enabling differentiation between k-regular graphs and node automorphism, a vital aspect for link prediction tasks. Our approach results in a lightweight architecture, reducing the need for extensive training datasets. Empirically, our method improves performance in challenging link prediction tasks across benchmark datasets, establishing its practical utility and supporting our theoretical findings. Notably, LLwLC achieves 20x and 10x speedups by requiring only 5% and 10% of the data from the PubMed and OGBL-Vessel datasets, while comparing to the state-of-the-art.

Subject: AAAI.2026 - Machine Learning

#19 Convergence of Fast Policy Iteration in Markov Games and Robust MDPs [PDF] [Copy] [Kimi] [REL]

Authors: Keith Badger, Jefferson Huang, Marek Petrik

Markov games and robust MDPs are closely related models that involve computing a pair of saddle point policies. As part of the long-standing effort to develop efficient algorithms for these models, the Filar-Tolwinski (FT) algorithm has shown considerable promise. As our first contribution, we demonstrate that FT may fail to converge to a saddle point and may loop indefinitely, even in small games. This observation contradicts the proof of FT's optimality in the original paper. As our second contribution, we then propose Residual Conditioned Policy Iteration (RCPI). RCPI builds on FT, but is guaranteed to converge to a saddle point. Our numerical results show that RCPI outperforms other convergent algorithms by several orders of magnitude.

Subject: AAAI.2026 - Machine Learning

#20 Mechanistic Dissection of Cross-Attention Subspaces in Text-to-Image Diffusion Models [PDF] [Copy] [Kimi] [REL]

Authors: Jun-Hyun Bae, Wonyong Jo, Jaehyup Lee, Heechul Jung

Text-to-image diffusion models utilize cross-attention to integrate textual information into the visual latent space, yet the transformation from text embeddings to latent features remains largely unexplored. We provide a mechanistic analysis of the output-value (OV) circuits within cross-attention layers through spectral analysis via singular value decomposition. Our analysis reveals that semantic concepts are encoded in low-dimensional subspaces spanned by singular vectors in OV circuits across cross-attention heads. To verify this, we intervene on concept-related components in the diffusion process, demonstrating that intervention on identified spectral components affects conceptual changes. We further validate these findings by examining visual outputs of isolated subspaces and their alignment with text embedding space. Through this mechanistic understanding, we demonstrate that only nullifying these spectral components can achieve targeted concept removal with performance comparable to existing methods while providing interpretability. Our work reveals how cross-attention layers encode semantic concepts in spectral subspaces of OV circuits, providing mechanistic insights and enabling precise concept manipulation without retraining.

Subject: AAAI.2026 - Machine Learning

#21 Medical Vision–Language Pretraining with LLM-Guided Temporal Supervision [PDF] [Copy] [Kimi] [REL]

Authors: Liang Bai, Zhi Wang, Huimin Yan, Xian Yang

Medical vision–language pretraining typically relies on static image–text pairs, overlooking temporal cues vital for understanding clinical progression. This limits model sensitivity to evolving semantics and reduces their effectiveness in real-world clinical reasoning. To address this challenge, we propose TAMM—a temporal alignment framework that leverages weak but semantically rich supervision from large language models (LLMs). Given temporally adjacent clinical reports, LLMs automatically generate (i) coarse-grained trend labels (e.g., improving or worsening), and (ii) fine-grained rationales explaining the supporting clinical evidence. These complementary signals inject temporal semantics without requiring manual annotation, and guide vision–language representation learning to capture trend-sensitive cross-modal alignment and rationale-grounded coherence. Experiments on multiple medical benchmarks demonstrate that TAMM improves retrieval and classification performance while yielding more interpretable, temporally consistent embeddings. Our results highlight the potential of leveraging LLM-derived supervision to equip vision–language models with temporal awareness critical for clinical applications.

Subject: AAAI.2026 - Machine Learning

#22 Multi-Level Domain Adaptation and Contrastive Domain Isolation with Bilinear Fusion for Patient Drug Response Prediction [PDF] [Copy] [Kimi] [REL]

Authors: Yuting Bai, Hanwen Lv, Wanwan Shi, Zhiyi Zou, Jiawei Luo

Accurate prediction of patient drug response is critical for precision cancer medicine but remains constrained by limited clinical data. While in vitro cell line data offer a scalable alternative, effective cross-domain transfer remains challenging. Many existing methods tend to overlook heterogeneous domain shifts across biological contexts, underrepresent the intrinsic differences between cell lines and patient tissues, and insufficiently capture high-order gene-drug interactions. To address these challenges, we propose MACB-DRP, a hierarchical transfer learning framework comprising three complementary stages that progressively coordinate adaptation across tissue, drug, and sample levels while enabling representation separation. The framework begins with tissue-aware domain adaptation, leveraging cancer-type classification and unsupervised alignment to preserve biologically meaningful structure across domains. It then incorporates drug-conditioned adversarial transfer for distribution alignment, coupled with bilinear fusion to model nonlinear and high-order gene-drug interactions. Finally, contrastive anchoring with feature-matched pairs enables fine-grained sample-level alignment, while feature-mismatched negatives preserve irreducible biological disparities. Experimental evaluation demonstrates that MACB-DRP achieves comprehensive predictive performance for patient drug responses, with robust results across multiple cancer types and nine drugs, and further reveals hierarchical structure across drugs and tissues in the visualization. These findings highlight the potential of biologically guided domain adaptation for improving translational pharmacogenomics.

Subject: AAAI.2026 - Machine Learning

#23 Collaborative Dual Representations for Semi-Supervised Partial Label Learning [PDF] [Copy] [Kimi] [REL]

Authors: Wei-Xuan Bao, Yong Rui, Min-Ling Zhang

Semi-supervised partial label learning (SSPLL) aims to improve the generalization performance of partial label (PL) classifiers by effectively leveraging unlabeled data. Nevertheless, the inherent ambiguity in supervision, where the ground-truth label of a PL example is hidden within a set of candidate labels, poses significant challenges. The presence of false positive labels potentially misleads model's judgment, resulting in pronounced confirmation bias. To address these issues, we propose a novel approach named CODUAL, which jointly learns a pair of dual representations for each instance: the predictive class distribution and the low-dimensional embedding. The dual representations interact and progress collaboratively during training. On one hand, in the embedding space the class prototypes are derived via solving a tailored empirical distance minimization problem and employed to smooth the pseudo-targets of unlabeled instances. On the other hand, the refined class distributions regularize the embedding space via encouraging instances with similar pseudo-targets to exhibit similar embeddings. Through an in-depth analysis, we provide-to the best of our knowledge-the first theoretical explanation of how collaborative dual representations facilitate more effective use of unlabeled data for disambiguation. Extensive experiments over benchmark datasets validate the superiority of our proposed approach.

Subject: AAAI.2026 - Machine Learning

#24 Revisiting (Un)Fairness in Recourse by Minimizing Worst-Case Social Burden [PDF] [Copy] [Kimi] [REL]

Authors: Ainhize Barrainkua, Giovanni De Toni, Jose A. Lozano, Novi Quadrianto

Machine learning based predictions are increasingly used in sensitive decision-making applications that directly affect our lives. This has led to extensive research into ensuring the fairness of classifiers. Beyond just fair classification, emerging legislation now mandates that when a classifier delivers a negative decision, it must also offer actionable steps an individual can take to reverse that outcome. This concept is known as algorithmic recourse. Nevertheless, many researchers have expressed concerns about the fairness guarantees within the recourse process itself. In this work, we provide a theoretical characterization of unfairness in algorithmic recourse, formally linking fairness guarantees in recourse and classification, and highlighting limitations of the standard equal cost paradigm. We then introduce a novel fairness framework based on social burden, along with a practical algorithm (MISOB), broadly applicable under real-world conditions. Empirical results on real-world datasets show that MISOB reduces the social burden across all groups without compromising overall classifier accuracy.

Subject: AAAI.2026 - Machine Learning

#25 Differentially Private Linear Programming: Reduced Sub-Optimality and Guaranteed Constraint Satisfaction [PDF] [Copy] [Kimi] [REL]

Authors: Alexander Benvenuti, Brendan Bialy, Miriam Dennis, Matthew Hale

Linear programming is a fundamental tool in a wide range of decision systems. However, without privacy protections, sharing the solution to a linear program may reveal information about the underlying data used to formulate it, which may be sensitive. Therefore, in this paper we introduce an approach for protecting sensitive data while formulating and solving a linear program. First, we prove that this method perturbs objectives and constraints in a way that makes them differentially private. Then, we show that (i) privatized problems always have solutions, and (ii) their solutions satisfy the constraints in their corresponding original, non-private problems. The latter result solves an open problem in the literature. Next, we analytically bound the expected sub-optimality of solutions that is induced by privacy. Numerical simulations show that, under a typical privacy setup, the solution produced by our method yields a 65% reduction in sub-optimality compared to the state of the art.

Subject: AAAI.2026 - Machine Learning