AAAI.2026 - Multiagent Systems | Cool Papers

#1 Local Guidance for Configuration-Based Multi-Agent Pathfinding [PDF] [Copy] [Kimi] [REL]

Authors: Tomoki Arita, Keisuke Okumura

Guidance is an emerging concept that improves the empirical performance of real-time, sub-optimal multi-agent pathfinding (MAPF) methods. It offers additional information to MAPF algorithms to mitigate congestion on a global scale by considering the collective behavior of all agents across the entire workspace. This global perspective helps reduce agents' waiting times, thereby improving overall coordination efficiency. In contrast, this study explores an alternative approach: providing local guidance in the vicinity of each agent. While such localized methods involve recomputation as agents move and may appear computationally demanding, we empirically demonstrate that supplying informative spatiotemporal cues to the planner can significantly improve solution quality without exceeding a moderate time budget. When applied to LaCAM, a leading configuration-based solver, this form of guidance establishes a new performance frontier for MAPF.

Subject: AAAI.2026 - Multiagent Systems

#2 When Natural Strategies Meet Fuzziness and Resource-Bounded Actions [PDF] [Copy] [Kimi] [REL]

Authors: Marco Aruta, Francesco Improta, Vadim Malvone, Aniello Murano

In formal strategic reasoning for Multi-Agent Systems (MAS), agents are typically assumed to (i) employ arbitrarily complex strategies, (ii) execute each move at zero cost, and (iii) operate over fully crisp game structures. These idealized assumptions stand in stark contrast with human decision-making in real-world environments. The natural strategies framework, along with some of its recent variants, partially addresses this gap by restricting strategies to concise rules guarded by regular expressions. Yet, it still overlook both the cost of each action and the uncertainty that often characterizes human perception of facts over the time. In this work, we introduce HumanATLF, a logic that builds upon natural strategies employing both fuzzy semantics and resource‐bound actions: each action carries a real-valued cost drawn from a non‐refillable budget, and atomic conditions and goals have degrees in [0,1]. We give a formal syntax and semantics, and prove that model checking is in P when both the strategy complexity k and resource budget b are fixed, NP-complete if just one strategic operator over Boolean objectives is allowed, and Delta^P_2‐complete when k and b vary. Moreover, we show that recall‐based strategies can be decided in PSPACE. We implement our algorithms in VITAMIN, an open source model-checking tool for MAS and validate them on an adversarial resource-aware drone rescue scenario.

Subject: AAAI.2026 - Multiagent Systems

#3 Efficient Multiagent Planning via Shared Action Suggestions [PDF] [Copy] [Kimi] [REL]

Authors: Dylan M. Asmar, Mykel J. Kochenderfer

Decentralized partially observable Markov decision processes with communication (Dec-POMDP-Com) provide a framework for multiagent decision making under uncertainty, but the NEXP-complete complexity for finite-horizon problems renders solutions intractable in general. While sharing actions and observations can reduce the complexity to PSPACE-complete, we propose an approach that bridges POMDPs and Dec-POMDPs by communicating only suggested joint actions, eliminating the need to share observations while retaining near-centralized performance. Our algorithm estimates joint beliefs using shared actions to prune infeasible beliefs. Each agent maintains possible belief sets for other agents, pruning them based on suggested actions to form an estimated joint belief usable with any centralized policy. This approach requires solving a POMDP for each agent, reducing computational complexity while preserving performance. We demonstrate its effectiveness on several Dec-POMDP benchmarks, showing performance comparable to centralized methods when shared actions enable effective belief pruning. This action-based communication framework offers a natural avenue for integrating human-agent cooperation, opening new directions for scalable multiagent planning under uncertainty, with applications in both autonomous systems and human-agent teams.

Subject: AAAI.2026 - Multiagent Systems

#4 A Phase Transition for Opinion Dynamics with Competing Biases [PDF] [Copy] [Kimi] [REL]

Authors: Federico Capannoli, Emilio Cruciani, Hlafo Alfie Mimun, Matteo Quattropani

We study the nonlinear evolution of binary opinions in a population of agents connected by a directed network, influenced by two competing forces. On the one hand agents are stubborn, i.e., have a tendency for one of the two opinions; on the other hand there is a disruptive bias that drives the agents toward the opposite opinion. The disruptive bias models external factors such as market innovations or social controllers aiming to challenge the status quo, while stubbornness reinforces the initial opinion making it harder for the external bias to drive the process toward change. Each agent updates its opinion according to a nonlinear rule that takes into account the opinions of its neighbors and the strength of the disruptive bias. We focus on random directed graphs with prescribed in- and out-degree sequences and prove that the dynamics exhibits a phase transition. When the disruptive bias is stronger than a certain critical threshold, the entire population rapidly converges to a consensus on the disruptive opinion. When the bias is weaker than this threshold, the system enters a metastable state in which only a fraction of the population adopts the new opinion, and this partial adoption persists for a long time. We explicitly characterize both the critical threshold and the long-term adoption fraction, showing that they depend only on few simple statistics of the degree sequences. Our analysis relies on a dual system of coalescing, branching, and dying particles, whose behavior is equivalent and allows a rigorous characterization of the system's dynamics. Our results characterize the interplay between the degree of the agents, their stubbornness, and the external bias, shedding light on the tipping points of opinion dynamics in networks.

Subject: AAAI.2026 - Multiagent Systems

#5 Mental Model-based Generation of Lies for Insider Threat Modeling [PDF] [Copy] [Kimi] [REL]

Authors: Brittany Cates, Sarath Sreedharan

It is well understood that mental modeling forms the foundation of many everyday interactions between humans. This includes both collaborative and deceptive interactions. One could argue that the modeling and manipulation of mental states lies at the heart of effective deception. In this paper, we examine the security problem of insider threat attacks. In this case, an adversary has already infiltrated an organization. The primary challenge for this attacker is to avoid suspicion until their true goal can be achieved. We see how existing model-based explanatory methods can be leveraged to generate lies that explain away potentially suspicious activities. We also propose a novel planning formulation which generates plans that appear to achieve an assigned goal while getting close enough to reach an alternative, covert goal. We evaluate our method through computational experiments and a user study.

Subject: AAAI.2026 - Multiagent Systems

#6 G-UBS: Towards Robust Understanding of Implicit Feedback via Group-Aware User Behavior Simulation [PDF] [Copy] [Kimi] [REL]

Authors: Boyu Chen, Siran Chen, Zhengrong Yue, Kainan Yan, Chenyun Yu, Beibei Kong, Lei Cheng, Chengxiang Zhuo, Zang Li, Yali Wang

User feedback is critical for refining recommendation systems, yet explicit feedback (e.g., likes or dislikes) remains scarce in practice. As a more feasible alternative, inferring user preferences from massive implicit feedback has shown great potential (e.g., a user quickly skipping a recommended video usually indicates disinterest). Unfortunately, implicit feedback is often noisy: a user might skip a video due to accidental clicks or other reasons, rather than disliking it. Such noise can easily misjudge user interests, thereby undermining recommendation performance. To address this issue, we propose a novel Group-aware User Behavior Simulation (G-UBS) paradigm, which leverages contextual guidance from relevant user groups, enabling robust and in-depth interpretation of implicit feedback for individual users. Specifically, G-UBS operates via two key agents. First, the User Group Manager (UGM) effectively clusters users to generate group profiles utilizing a ``summarize-cluster-reflect" workflow based on LLMs. Second, the User Feedback Modeler (UFM) employs an innovative group-aware reinforcement learning approach, where each user is guided by the associated group profiles during the reinforcement learning process, allowing UFM to robustly and deeply examine the reasons behind implicit feedback. To assess our G-UBS paradigm, we have constructed a Video Recommendation benchmark with Implicit Feedback (IF-VR). To the best of our knowledge, this is the first multi-modal benchmark for implicit feedback evaluation in video recommendation, encompassing 15k users, 25k videos, and 933k interaction records with implicit feedback. Extensive experiments on IF-VR demonstrate that G-UBS significantly outperforms mainstream LLMs and MLLMs, with a 4.0% higher proportion of videos achieving a play rate > 30% and 14.9% higher reasoning accuracy on IF-VR.

Subject: AAAI.2026 - Multiagent Systems

#7 GUI-Eyes: Tool-Augmented Perception for Visual Grounding in GUI Agents [PDF] [Copy] [Kimi] [REL]

Authors: Chen Chen, Jiawei Shao, Dakuan Lu, Haoyi Hu, Xiangcheng Liu, Hantao Yao, Wu Liu

Recent advances in vision-language models (VLMs) and reinforcement learning (RL) have driven progress in GUI automation. However, most existing methods rely on static, one-shot visual inputs and passive perception, lacking the ability to adaptively determine when, whether, and how to observe the interface. We present GUI-Eyes, a reinforcement learning framework for active visual perception in GUI tasks. To acquire more informative observations, the agent learns to make strategic decisions on both whether and how to invoke visual tools, such as cropping or zooming, within a two-stage reasoning process. To support this behavior, we introduce a progressive perception strategy that decomposes the decision-making into coarse exploration and fine-grained grounding, coordinated by a two-level policy. In addition, we design a spatially continuous reward function tailored to tool usage, which integrates both location proximity and region overlap to provide dense supervision and alleviate the reward sparsity common in GUI environments. On the ScreenSpot-Pro benchmark, GUI-Eyes-3B achieves 44.8% grounding accuracy using only 3k labeled samples, significantly outperforming both supervised and RL-based baselines. These results highlight that tool-aware active perception, enabled by staged policy reasoning and fine-grained reward feedback, is critical for building robust and data-efficient GUI agents.

Subject: AAAI.2026 - Multiagent Systems

#8 Adversarial Attack on Black-Box Multi-Agent by Adaptive Perturbation [PDF] [Copy] [Kimi] [REL]

Authors: Jianming Chen, Yawen Wang, Junjie Wang, Xiaofei Xie, Yuanzhe Hu, Qing Wang, Fanjiang Xu

Evaluating security and reliability for multi-agent systems (MAS) is urgent as they become increasingly prevalent in various applications. As an evaluation technique, existing adversarial attack frameworks face certain limitations, e.g., impracticality due to the requirement of white-box information or high control authority, and a lack of stealthiness or effectiveness as they often target all agents or specific fixed agents. To address these issues, we propose AdapAM, a novel framework for adversarial attacks on black-box MAS. AdapAM incorporates two key components: (1) Adaptive Selection Policy simultaneously selects the victim and determines the anticipated malicious action (the action would lead to the worst impact on MAS), balancing effectiveness and stealthiness. (2) Proxy-based Perturbation to Induce Malicious Action utilizes generative adversarial imitation learning to approximate the target MAS, allowing AdapAM to generate perturbed observations using white-box information and thus induce victims to execute malicious action in black-box settings. We evaluate AdapAM across eight multi-agent environments and compare it with four state-of-the-art and commonly-used baselines. Results demonstrate that AdapAM achieves the best attack performance in different perturbation rates. Besides, AdapAM-generated perturbations are the least noisy and hardest to detect, emphasizing the stealthiness.

Subject: AAAI.2026 - Multiagent Systems

#9 Simulating Dispute Mediation with LLM-Based Agents for Legal Research [PDF] [Copy] [Kimi] [REL]

Authors: Junjie Chen, Haitao Li, Minghao Qin, Yujia Zhou, Yanxue Ren, Wuyue Wang, Yiqun Liu, Yueyue Wu, Qingyao Ai

Legal dispute mediation plays a crucial role in resolving civil disputes, yet its empirical study is limited by privacy constraints and complex multivariate interactions. To address this limitation, we present AgentMediation, the first LLM-based agent framework for simulating dispute mediation. It simulates realistic mediation processes grounded in real-world disputes and enables controlled experimentation on key variables such as disputant strategies, dispute causes, and mediator expertise. Our empirical analysis reveals patterns consistent with sociological theories, including Group Polarization and Surface-level Consensus. As a comprehensive and extensible platform, AgentMediation paves the way for deeper integration of social science and AI in legal research.

Subject: AAAI.2026 - Multiagent Systems

#10 ZeRCP: Towards Communication-Efficient Collaborative Perception and Future Scene Prediction via Request-Free Spatial Filtering [PDF] [Copy] [Kimi] [REL]

Authors: Yijie Chen, Yuzhe Ji, Haotian Wang, Xiaoyun Qiu, Ying-Cong Chen, Xinhu Zheng

Multi-Agent collaboration addresses inherent limitations of individual agent systems, including limited sensing range and occlusion-induced blind spots. Despite significant progress, persistent challenges such as constrained communication bandwidth and under-explored subsequent extensions still hinder real-time deployment and further developments of collaborative autonomous driving systems. In this work, we propose ZeRCP, a unified communication-efficient framework that bridges collaborative perception with future scene prediction. Specifically, (i) we devise a plug-and-play request-free spatial filtering module (ZeroR) that eliminates the reliance on request maps while preserving inter-agent spatial complementarity modeling. This approach further reduce communication latency and bandwidth consumptions. (ii) We design a multi-scale pyramidal prediction network anchored by a novel Spatial-Temporal Deformable Attention (STDA) module, extending frame-wise detection to multi-frame predictions. This method adeptly models spatiotemporal dynamics without relying on auto-regressive recursion. We evaluate our method on a large-scale dataset in challenging semantic segmentation and scene prediction tasks. Extensive experiments demonstrate the superiority and effectiveness of ZeRCP in bandwidth-constrained collaboration scenarios and spatiotemporal prediction applications.

Subject: AAAI.2026 - Multiagent Systems

#11 GDBA Revisited: Unleashing the Power of Guided Local Search for Distributed Constraint Optimization [PDF] [Copy] [Kimi] [REL]

Authors: Yanchen Deng, Xinrun Wang, Bo An

Local search is an important class of incomplete algorithms for solving Distributed Constraint Optimization Problems (DCOPs) but it often converges to poor local optima. While Generalized Distributed Breakout Algorithm (GDBA) provides a comprehensive rule set to escape premature convergence, its empirical benefits remain marginal on general-valued problems. In this work, we systematically examine GDBA and identify three factors that potentially lead to its inferior performance, i.e., over-aggressive constraint violation conditions, unbounded penalty accumulation, and uncoordinated penalty updates. To address these issues, we propose Distributed Guided Local Search (DGLS), a novel GLS framework for DCOPs that incorporates an adaptive violation condition to selectively penalize constraints with high cost, a penalty evaporation mechanism to control the magnitude of penalization, and a synchronization scheme for coordinated penalty updates. We theoretically show that the penalty values are bounded, and agents play a potential game in DGLS. Extensive empirical results on various benchmarks demonstrate the great superiority of DGLS over state-of-the-art baselines. Compared to Damped Max-sum with high damping factors, our DGLS achieves competitive performance on general-valued problems, and outperforms by significant margins on structured problems in terms of anytime results.

Subject: AAAI.2026 - Multiagent Systems

#12 S-DAG: A Subject-Based Directed Acyclic Graph for Multi-Agent Heterogeneous Reasoning [PDF] [Copy] [Kimi] [REL]

Authors: Jiangwen Dong, Zehui Lin, Wanyu Lin, Mingjin Zhang

Large Language Models (LLMs) have achieved impressive performance in complex reasoning problems. Their effectiveness highly depends on the specific nature of the task, especially the required domain knowledge. Existing approaches, such as mixture-of-experts, typically operate at the task level; they are too coarse to effectively solve the heterogeneous problems involving multiple subjects. This work proposes a novel framework that performs fine-grained analysis at subject level equipped with a designated multi-agent collaboration strategy for addressing heterogeneous problem reasoning. Specifically, given an input query, we first employ a Graph Neural Network to identify the relevant subjects and infer their interdependencies to generate an Subject-based Directed Acyclic Graph (S-DAG), where nodes represent subjects and edges encode information flow. Then we profile the LLM models by assigning each model a subject-specific expertise score, and select the top-performing one for matching corresponding subject of the S-DAG. Such subject-model matching enables graph-structured multi-agent collaboration where information flows from the starting model to the ending model over S-DAG. We curate and release multi-subject subsets of standard benchmarks (MMLU-Pro, GPQA, MedMCQA) to better reflect complex, real-world reasoning tasks. Extensive experiments show that our approach significantly outperforms existing task-level model selection and multi-agent collaboration baselines in accuracy and efficiency. These results highlight the effectiveness of subject-aware reasoning and structured collaboration in addressing complex and multi-subject problems.

Subject: AAAI.2026 - Multiagent Systems

#13 iMAD: Intelligent Multi-Agent Debate for Efficient and Accurate LLM Inference [PDF] [Copy] [Kimi] [REL]

Authors: Wei Fan, JinYi Yoon, Bo Ji

Large Language Model (LLM) agent systems have advanced rapidly, driven by their strong generalization in zero-shot settings. To further enhance reasoning and accuracy on complex tasks, Multi-Agent Debate (MAD) has emerged as a promising framework that engages multiple LLM agents in structured debates to encourage diverse reasoning. However, triggering MAD for every query is inefficient, as it incurs substantial computational (token) cost and may even degrade accuracy by overturning correct answers from single-agent. To address these limitations, we propose intelligent Multi-Agent Debate (iMAD), a token-efficient framework that selectively triggers MAD only when it is likely to be beneficial (i.e., correcting an initially wrong answer). To achieve this goal, iMAD learns generalizable model behaviors to make accurate debate decisions. Specifically, iMAD first prompts a single agent to produce a structured self-critique response, from which we extract 41 interpretable linguistic and semantic features capturing hesitation cues. Then, iMAD uses a lightweight debate decision classifier, trained using our proposed FocusCal loss without test-dataset-specific tuning, to make robust zero-shot debate decisions. Through extensive experiments using six (visual) question answering datasets against five competitive baselines, we show that iMAD significantly reduces token usage (by up to 92%) while also improving final answer accuracy (by up to 13.5%).

Subject: AAAI.2026 - Multiagent Systems

#14 Cost-Effective Communication: An Auction-based Method for Language Agent Interaction [PDF] [Copy] [Kimi] [REL]

Authors: Yijia Fan, Jusheng Zhang, Kaitong Cai, Jing Yang, Chengpei Tang, Jian Wang, Keze Wang

Multi-agent systems (MAS) built on large language models (LLMs) often suffer from inefficient ''free-for-all'' communication, leading to exponential token costs and low signal-to-noise ratios that hinder their practical deployment. We challenge the notion that more communication is always beneficial, hypothesizing instead that the core issue is the absence of resource rationality. We argue that "free'' communication, by ignoring the principle of scarcity, inherently breeds inefficiency and unnecessary expenses. To address this, we introduce the Dynamic Auction-based Language Agent (DALA), a novel framework that treats communication bandwidth as a scarce and tradable resource. Specifically, our DALA regards inter-agent communication as a centralized auction, where agents learn to bid for the opportunity to speak based on the predicted value density of their messages. Thus, our DALA intrinsically encourages agents to produce concise, informative messages while filtering out low-value communication. Extensive and comprehensive experiments demonstrate that our economically-driven DALA achieves new state-of-the-art performance across seven challenging reasoning benchmarks, including 84.32% on MMLU and a 91.21% pass@1 rate on HumanEval. Note that this is accomplished with remarkable efficiency, i.e., our DALA uses only 6.25 million tokens, a fraction of the resources consumed by current state-of-the-art methods on GSM8K. Further analysis reveals that our DALA cultivates the emergent skill of strategic silence, effectively adapting its communication strategies from verbosity to silence in a dynamic manner via resource constraints.

Subject: AAAI.2026 - Multiagent Systems

#15 Symbolic Planning and Multi-Agent Path Finding in Extremely Dense Environments with Unassigned Agents [PDF] [Copy] [Kimi] [REL]

Authors: Bo Fu, Zhe Chen, Rahul Chandan, Alexandre Ormiga Galvao Barbosa, Michael Caldara, Joey W. Durham, Federico Pecora

We introduce the Block Rearrangement Problem (BRaP), a challenging component of large warehouse management which involves rearranging storage blocks within dense grids to achieve a goal state. We formally define the BRaP as a graph search problem. Building on intuitions from sliding puzzle problems, we propose five search-based solution algorithms, leveraging joint configuration space search, classical planning, multi-agent pathfinding, and expert heuristics. We evaluate the five approaches empirically for plan quality and scalability. Despite the exponential relation between search space size and block number, our methods demonstrate efficiency in creating rearrangement plans for deeply buried blocks in up to 80x80 grids.

Subject: AAAI.2026 - Multiagent Systems

#16 Counterfactual Planning for Generalizable Agents’ Actions [PDF] [Copy] [Kimi] [REL]

Authors: Jiarun Fu, Lizhong Ding, Qiuning Wei, Yuhan Guo, Yurong Cheng, Junyu Zhang

Large language models have revolutionized agent planning by serving as the engine of heuristic guidance. However, LLM-based agents often struggle to generalize across complex environments and to adapt to stochastic feedback arising from environment–action interactions. We propose Counterfactual Planning—a method designed to improve the generalizability and adaptability of agents' actions by inferring causal representations of environmental confounders and performing counterfactual reasoning over planned actions. We formalize the agent planning process as a structural causal model, providing a mathematical formulation for causal analysis of how environmental states influence action generation and how actions affect future state transitions. To support generalizable action planning, we introduce the State Causality Evaluator (SCE), which dynamically infers task-conditioned causal representations from complex environment states; and to enhance adaptability under stochastic feedback, we propose the What-If-Not (WIN) reward, which performs counterfactual interventions to refine actions through causal evaluation. We validate our framework in an open-world environment, where experiments demonstrate improvements in both action generalization and planning adaptability.

Subject: AAAI.2026 - Multiagent Systems

#17 GRDC: A Unified Graph-Driven Framework for Role Discovery and Communication in Multi-Agent Reinforcement Learning [PDF] [Copy] [Kimi] [REL]

Authors: Zihong Gao, Hongjian Liang, Yuanhui Hao, Lei Hao, Liangjun Ke

Effective coordination in Multi-Agent Reinforcement Learning (MARL) is particularly challenging under partial observability, where agents must reason about potential collaborators using only local information. Existing methods fall into two categories: communication-based approaches that enable message exchange but often fix or misidentify who the collaborators are, and role-based approaches that encourage specialization based on behavioral similarity. However, both lines of work overlook the task‑induced cooperative dependencies that decide which agents should collaborate, leading to miscommunication or role misassignment under partial observability. We introduce GRDC (Graph‑driven Role Discovery and Communication), a unified framework that approximates these dependencies by dynamically constructing local interaction graphs from trajectory embeddings, then uses these graphs to infer roles via prototype matching and to restrict communication to intra‑role agents with attention-based aggregation. Beyond role inference and communication, GRDC maximizes role entropy, decorrelates prototypes, and dynamically prunes redundant ones to obtain structured yet compact role specialization. Experimental results on Predator Prey, Cooperative Navigation, and SMACv2 demonstrate that GRDC consistently outperforms state-of-the-art communication- and role-based baselines, improving coordination efficiency and training stability across tasks.

Subject: AAAI.2026 - Multiagent Systems

#18 Emergent Fast-Slow Dynamics in Multi-Agent Q-Learning for Networked Stochastic Games [PDF] [Copy] [Kimi] [REL]

Authors: Yuxin Geng, Wolfram Barfuss, Xingru Chen

Understanding the emergence of collective behaviors of multi-agent systems requires investigating the learning dynamics. However, the theoretical analysis of large-scale graph-structured multi-agent reinforcement learning (MARL) systems remains challenging due to agent heterogeneity and the intrinsic coupling between state transitions and individual Q-value updates. In this work, we develop a unified theoretical framework that captures the evolution of agent behaviors at both individual and population levels. By leveraging the pair approximation technique from statistical physics, we derive a closed set of evolution equations that accurately describe the temporal dynamics of the system. Our analysis also reveals a separation of time scales. For small learning rates, state transitions equilibrate rapidly, while Q-value updates evolve slowly with stationary state distributions. Through extensive agent-based simulations, we validate the robustness of our theoretical results and explain the mechanisms that lead to the emergence of cooperation in social dilemmas. Our framework offers new perspectives for bridging complex systems science and MARL, providing insights for the design of cooperative and resilient AI.

Subject: AAAI.2026 - Multiagent Systems

#19 Agent-SAMA: State-Aware Mobile Assistant [PDF] [Copy] [Kimi] [REL]

Authors: Linqiang Guo, Wei Liu, Yi Wen Heng, Tse-Hsun (Peter) Chen, Yang Wang

Mobile Graphical User Interface (GUI) agents aim to autonomously complete tasks within or across apps based on user instructions. While recent Multimodal Large Language Models (MLLMs) enable these agents to interpret UI screens and perform actions, existing agents remain fundamentally reactive. They reason over the current UI screen but lack a structured representation of the app navigation flow, lim- iting GUI agents’ ability to understand execution context, detect unexpected execution results, and recover from errors. We introduce Agent-SAMA, a state-aware multi-agent framework that models app execution as a Finite State Machine (FSM), treating UI screens as states and user actions as transitions. Agent-SAMA implements four specialized agents that collaboratively construct and use FSMs in real time to guide task planning, execution verification, and recovery. We evaluate Agent-SAMA on two types of benchmarks: cross- app (Mobile-Eval-E, SPA-Bench) and mostly single-app (AndroidWorld). On Mobile-Eval-E, Agent-SAMA achieves an 84.0% success rate and a 71.9% recovery rate. On SPA-Bench, it reaches an 80.0% success rate with a 66.7% recovery rate. Compared to prior methods, Agent-SAMA improves task success by up to 12% and recovery success by 13.8%. On AndroidWorld, Agent-SAMA achieves a 63.7% success rate, outperforming the baselines. Our results demonstrate that structured state modeling enhances robustness and can serve as a lightweight, model-agnostic memory layer for future GUI agents.

Subject: AAAI.2026 - Multiagent Systems

#20 Orion: Steering Personalized Web Agents via Global-Micro Profiling and Adaptive Intent Tracking [PDF] [Copy] [Kimi] [REL]

Authors: Die Hu, Jingguo Ge, Weitao Tang, He Kong, Liangxiong Li, Bingzhen Wu

Recently, Large Language Models (LLMs) based Web Agents have shown significant potential in web understanding and interaction tasks. However, their personalization ability and user experience remain limited by the ambiguity and dynamic nature of user intent, struggling to model diverse user interests and track intent changes over time. To address these challenges, this paper proposes Orion, a novel personalized Web Agent. Orion adopts a global-micro profiling mechanism to balance users' long-term stable preferences and scenario-based needs, and introduces context-aware interest retrieval to enhance personalization. Additionally, we design adaptive profile tracking and proactive disambiguation mechanisms to effectively address the continuous evolution of user intent in multi-turn interactions. Orion is optimized through end-to-end online reinforcement learning, improving personalized reasoning and decision-making ability in real interactive scenarios. Experiments demonstrate that Orion significantly outperforms state-of-the-art baselines in personalized understanding and task efficiency.

Subject: AAAI.2026 - Multiagent Systems

#21 Tapas Are Free! Training-Free Adaptation of Programmatic Agents via LLM-Guided Program Synthesis in Dynamic Environments [PDF] [Copy] [Kimi] [REL]

Authors: Jinwei Hu, Yi Dong, Youcheng Sun, Xiaowei Huang

Autonomous agents in safety-critical applications must continuously adapt to dynamic conditions without compromising performance and reliability. This work introduces TAPA (Training-free Adaptation of Programmatic Agents), a novel framework that positions large language models (LLMs) as intelligent moderators of the symbolic action space. Unlike prior programmatic agents typically generate a monolithic policy program or rely on fixed symbolic action sets, TAPA synthesizes and adapts modular programs for individual high-level actions, referred to as logical primitives. By decoupling strategic intent from execution, TAPA enables meta-agents to operate over an abstract, interpretable action space while the LLM dynamically generates, composes, and refines symbolic programs tailored to each primitive. Extensive experiments across cybersecurity and swarm intelligence domains validate TAPA's effectiveness. In autonomous DDoS defense scenarios, TAPA achieves 77.7% network uptime while maintaining near-perfect detection accuracy in unknown dynamic environments. In swarm intelligence formation control under environmental and adversarial disturbances, TAPA consistently preserves consensus at runtime where baseline methods fail. This work promotes a paradigm shift for autonomous system design in evolving environments, from policy adaptation to dynamic action adaptation.

Subject: AAAI.2026 - Multiagent Systems

#22 Unreal-MAP: Unreal-Engine-Based General Platform for Multi-agent Reinforcement Learning [PDF] [Copy] [Kimi] [REL]

Authors: Tianyi Hu, Qingxu Fu, Zhiqiang Pu, Yuan Wang, Tenghai Qiu

In this paper, we propose Unreal Multi-Agent Playground (Unreal-MAP), an MARL general platform based on the Unreal-Engine (UE). Unreal-MAP allows users to freely create multi-agent tasks using the vast visual and physical resources available in the UE community, and deploy state-of-the-art (SOTA) MARL algorithms within them. Unreal-MAP is user-friendly in terms of deployment, modification, and visualization, and all its components are open-source. We also develop an experimental framework compatible with algorithms ranging from rule-based to learning-based provided by third-party frameworks. Lastly, we deploy several SOTA algorithms in example tasks developed via Unreal-MAP, and conduct corresponding experimental analyses including a sim2real demo. We believe Unreal-MAP can play an important role in the MARL field by closely integrating existing algorithms with user-customized tasks, thus advancing the field of MARL.

Subject: AAAI.2026 - Multiagent Systems

#23 GRIP: Latent Field-Guided Graph Policy for Budget-Constrained Multi-Agent Routing [PDF] [Copy] [Kimi] [REL]

Authors: Yujiao Hu, Zuyu Chen, MengJie Lee, Jinchao Chen, Meng Shen, Hailun Zhang, Wei Li, Yan Pan

Subset selection under budget constraints is critical in applications like multi-robot patrolling, crime deterrence, and targeted marketing, where multiple agents must jointly select targets and plan feasible routes. We formalize this challenge as Multi-Subset Selection with Budget-Constrained Routing (MSS-BCR), involving complex, non-additive cost structures that defy traditional methods. We propose GRIP, a graph-based framework integrating spatial reward fields and policy learning to enable coordinated, budget-aware target selection and routing. GRIP uses attention-based embeddings and constraint-triggered pruning with utility recovery to produce high-quality, feasible solutions. Experiments based on multiple synthetic and real-world datasets show GRIP outperforms baselines in reward efficiency and scalability across varied scenarios.

Subject: AAAI.2026 - Multiagent Systems

#24 Graph Attention-Guided Search for Dense Multi-Agent Pathfinding [PDF] [Copy] [Kimi] [REL]

Authors: Rishabh Jain, Keisuke Okumura, Michael Amir, Amanda Prorok

Finding near-optimal solutions for dense multi-agent pathfinding (MAPF) problems in real-time remains challenging even for state-of-the-art planners. To this end, we develop a hybrid framework that integrates a learned heuristic derived from MAGAT, a neural MAPF policy with a graph attention scheme, into a leading search-based algorithm, LaCAM. While prior work has explored learning-guided search in MAPF, such methods have historically underperformed. In contrast, our approach, termed LaGAT, outperforms both purely search-based and purely learning-based methods in dense scenarios. This is achieved through an enhanced MAGAT architecture, a pre-train–then–fine-tune strategy on maps of interest, and a deadlock detection scheme to account for imperfect neural guidance. Our results demonstrate that, when carefully designed, hybrid search offers a powerful solution for tightly coupled, challenging multi-agent coordination problems.

Subject: AAAI.2026 - Multiagent Systems

#25 Robust Multiagent Combinatorial Path Finding [PDF] [Copy] [Kimi] [REL]

Authors: Yehonatan Kidushim, Avraham Natan, Roni Stern, Meir Kalech

Consider a system of multiple physical agents tasked with collaboratively collecting a set of spatially distributed goals as quickly as possible while avoiding collisions with the environment and with each other. This type of problem, which involves Multi-Agent Path Finding (MAPF) and task allocation, is called Multi-Agent Combinatorial Path Finding (MCPF). Prior work on MCPF assumed each agent has a final goal it must reach, there are no orientation constraints on the agents' movements, and the agents will follow their planned actions as intended. These assumptions rarely hold in real physical robots, which limits the applicability of existing MCPF algorithms in practical applications. We propose the Robust CBSS framework, a robust planning approach that solves MCPF without the aforementioned simplifying assumptions, and provide two implementations: a baseline version (RCbssBase) and an efficient version (RCbssEff). RCbssEff generalizes the Conflict-Based Steiner Search (CBSS) algorithm, building on ideas from the p-Robust CBS algorithm and algorithms for solving the Equality Generalized Traveling Salesman Problem. We prove that RCbssEff is complete and can be configured to return optimal solutions. Experimental results on benchmark MCPF problems show that RCbssEff balances planning time, solution cost, and collision reduction compared to baselines.

Subject: AAAI.2026 - Multiagent Systems