AAAI.2025 - Multiagent Systems

| Total: 32

#1 MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale [PDF25] [Copy] [Kimi38] [REL]

Authors: Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov, Alexey Skrynnik

Multi-agent pathfinding (MAPF) is a problem that generally requires finding collision-free paths for multiple agents in a shared environment. Solving MAPF optimally, even under restrictive assumptions, is NP-hard, yet efficient solutions for this problem are critical for numerous applications, such as automated warehouses and transportation systems. Recently, learning-based approaches to MAPF have gained attention, particularly those leveraging deep reinforcement learning. Typically, such learning-based MAPF solvers are augmented with additional components like single-agent planning or communication. Orthogonally, in this work we rely solely on imitation learning that leverages a large dataset of expert MAPF solutions and transformer-based neural network to create a foundation model for MAPF called MAPF-GPT. The latter is capable of generating actions without additional heuristics or communication. MAPF-GPT demonstrates zero-shot learning abilities when solving the MAPF problems that are not present in the training dataset. We show that MAPF-GPT notably outperforms the current best-performing learnable MAPF solvers on a diverse range of problem instances and is computationally efficient during inference.

Subject: AAAI.2025 - Multiagent Systems


#2 Orpheus: Engineering Multiagent Systems via Communicating Agents [PDF24] [Copy] [Kimi22] [REL]

Authors: Matteo Baldoni, Samuel H. Christie V, Munindar P. Singh, Amit K. Chopra

We propose Orpheus, a novel programming model for communicating agents based on information protocols and realized using cognitive programming. Whereas traditional models are focused on reactions to handle incoming messages, Orpheus supports organizing the internal logic of an agent based on its goals. We give an operational semantics for Orpheus and implement this semantics in an adapter to help build agents. We use the adapter to demonstrate how Orpheus simplifies the programming of decentralized multiagent systems compared to the reactive programming model.

Subject: AAAI.2025 - Multiagent Systems


#3 Intelligent OPC Engineer Assistant for Semiconductor Manufacturing [PDF8] [Copy] [Kimi9] [REL]

Authors: Guojin Chen, Haoyu Yang, Bei Yu, Haoxing Ren

Advancements in chip design and manufacturing have enabled the processing of complex tasks such as deep learning and natural language processing, paving the way for the development of artificial general intelligence (AGI). AI, on the other hand, can be leveraged to innovate and streamline semiconductor technology from planning and implementation to manufacturing. In this paper, we present Intelligent OPC Engineer Assistant, an AI/LLM-powered methodology designed to solve the core manufacturing-aware optimization problem known as Optical Proximity Correction (OPC). The methodology involves a reinforcement learning-based OPC recipe search and a customized multi-modal agent system for recipe summarization. Experiments demonstrate that our methodology can efficiently build OPC recipes on various chip designs with specially handled design topologies, a task that typically requires the full-time effort of OPC engineers with years of experience.

Subject: AAAI.2025 - Multiagent Systems


#4 Contract-based Design and Verification of Multi-Agent Systems with Quantitative Temporal Requirements [PDF6] [Copy] [Kimi12] [REL]

Authors: Rafael Dewes, Rayna Dimitrova

Quantitative requirements play an important role in the context of multi-agent systems, where there is often a trade-off between the tasks of individual agents and the constraints that the agents must jointly adhere to. We study multi-agent systems whose requirements are formally specified in the quantitative temporal logic LTL[F] as a combination of local task specifications for the individual agents and a shared safety constraint, The intricate dependencies between the individual agents entailed by their local and shared objectives make the design of multi-agent systems error-prone, and their verification time-consuming. In this paper we address this problem by proposing a novel notion of quantitative assume-guarantee contracts, that enables the compositional design and verification of multi-agent systems with quantitative temporal specifications. The crux of these contracts lies in their ability to capture the coordination between the individual agents to achieve an optimal value of the overall specification under any possible behavior of the external environment. We show that the proposed framework improves the scalability and modularity of formal verification of multi-agent systems against quantitative temporal specifications.

Subject: AAAI.2025 - Multiagent Systems


#5 Decentralized and Uncoordinated Learning of Stable Matchings: A Game-Theoretic Approach [PDF5] [Copy] [Kimi5] [REL]

Authors: S. Rasoul Etesami, R. Srikant

We consider the problem of learning stable matchings with unknown preferences in a decentralized and uncoordinated manner, where ``decentralized" means that players make decisions individually without the influence of a central platform, and ``uncoordinated" means that players do not need to synchronize their decisions using pre-specified rules. First, we provide a game formulation for this problem with known preferences, where the set of pure Nash equilibria (NE) coincides with the set of stable matchings, and mixed NE can be rounded to a stable matching. Then, we show that for hierarchical markets, applying the exponential weight (EXP) learning algorithm to the stable matching game achieves logarithmic regret in a fully decentralized and uncoordinated fashion. Moreover, we show that EXP converges locally and exponentially fast to a stable matching in general matching markets. We complement our results by introducing another decentralized and uncoordinated learning algorithm that globally converges to a stable matching with arbitrarily high probability.

Subject: AAAI.2025 - Multiagent Systems


#6 Optimising Spatial Teamwork Under Uncertainty [PDF5] [Copy] [Kimi4] [REL]

Authors: Gregory Everett, Ryan J. Beal, Tim Matthews, Timothy J. Norman, Sarvapali D. Ramchurn

We introduce a novel method for assessing agent teamwork based on their spatial coordination. Our approach models the influence of spatial proximity on team formation and sustained spatial dominance over adversaries using a Multi-agent Markov Decision Process. We develop an algorithm to derive efficient teamwork strategies by combining Monte Carlo Tree Search and linear programming. When applied to team defence in football (soccer) using real-world data, our approach reduces opponent threat by 21%, outperforming optimised individual behaviour by 6%. Additionally, our model enhances the predictive accuracy of future attack locations and provides deeper insights compared to existing teamwork models that do not explicitly consider the spatial dynamics of teamwork.

Subject: AAAI.2025 - Multiagent Systems


#7 Exact Algorithms for Multiagent Path Finding with Communication Constraints on Tree-Like Structures [PDF7] [Copy] [Kimi2] [REL]

Authors: Foivos Fioravantes, Dušan Knop, Jan Matyáš Křišťan, Nikolaos Melissinos, Michal Opler

Consider the scenario where multiple agents have to move in an optimal way through a network, each one towards their ending position, and while avoiding collisions. By optimal, we mean as fast as possible, which is evaluated by a measure known as the makespan of the proposed solution. This is the setting studied in the Multiagent Path Finding problem. In this work we additionally provide the agents with a way to communicate with each other. Due to size constraints, it is reasonable to assume that the range of the communication of each agent will be limited. What should be the trajectories of the agents to, additionally, maintain a backbone of communication? In this work we study this Multiagent Path Finding with Communication Constraint problem under the parameterized complexity framework. Our main contribution is three exact algorithms that are efficient when considering particular structures for the input network. We provide such algorithms for the case when the communication range and the number of agents (the makespan resp.) is provided in the input and the network has a tree topology, or bounded maximum degree (has a tree-like topology, i.e., bounded treewidth resp.). We complement these results by showing that it is highly unlikely to construct efficient algorithms when considering the number of agents as part of the input, even if the makespan is 3 and the communication range is 1.

Subject: AAAI.2025 - Multiagent Systems


#8 Solving Multiagent Path Finding on Highly Centralized Networks [PDF4] [Copy] [Kimi] [REL]

Authors: Foivos Fioravantes, Dušan Knop, Jan Matyáš Křišťan, Nikolaos Melissinos, Michal Opler, Tung Anh Vu

The Mutliagent Path Finding (MAPF) problem consists of identifying the trajectories that a set of agents should follow inside a given network in order to reach their desired destinations as soon as possible, but without colliding with each other. We aim to minimize the maximum time any agent takes to reach their goal, ensuring optimal path length. In this work, we complement a recent thread of results that aim to systematically study the algorithmic behavior of this problem, through the parameterized complexity point of view. First, we show that MAPF is NP-hard when the given network has a star-like topology (bounded vertex cover number) or is a tree with 11 leaves. Both of these results fill important gaps in our understanding of the tractability of this problem that were left untreated in the recent work of Fioravantes et al., Exact Algorithms and Lowerbounds for Multiagent Path Finding: Power of Treelike Topology, presented in AAAI'24. Nevertheless, our main contribution is an exact algorithm that scales well as the input grows (FPT) when the topology of the given network is highly centralized (bounded distance to clique). This parameter is significant as it mirrors real-world networks. In such environments, a bunch of central hubs or nodes (e.g., processing areas) are connected to peripheral nodes.

Subject: AAAI.2025 - Multiagent Systems


#9 Synchronization in Learning in Periodic Zero-Sum Games Triggers Divergence from Nash Equilibrium [PDF4] [Copy] [Kimi1] [REL]

Authors: Yuma Fujimoto, Kaito Ariu, Kenshi Abe

Learning in zero-sum games studies a situation where multiple agents competitively learn their strategy. In such multi-agent learning, we often see that the strategies cycle around their optimum, i.e., Nash equilibrium. When a game periodically varies (called a ``periodic'' game), however, the Nash equilibrium moves generically. How learning dynamics behave in such periodic games is of interest but still unclear. Interestingly, we discover that the behavior is highly dependent on the relationship between the two speeds at which the game changes and at which players learn. We observe that when these two speeds synchronize, the learning dynamics diverge, and their time-average does not converge. Otherwise, the learning dynamics draw complicated cycles, but their time-average converges. Under some assumptions introduced for the dynamical systems analysis, we prove that this behavior occurs. Furthermore, our experiments observe this behavior even if these assumptions are removed. This study discovers a novel phenomenon, i.e., synchronization, and gains insight widely applicable to learning in periodic games.

Subject: AAAI.2025 - Multiagent Systems


#10 CP-Guard: Malicious Agent Detection and Defense in Collaborative Bird’s Eye View Perception [PDF2] [Copy] [Kimi2] [REL]

Authors: Senkang Hu, Yihang Tao, Guowen Xu, Yiqin Deng, Xianhao Chen, Yuguang Fang, Sam Kwong

Collaborative Perception (CP) has shown a promising technique for autonomous driving, where multiple connected and autonomous vehicles (CAVs) share their perception information to enhance the overall perception performance and expand the perception range. However, in CP, ego CAV needs to receive messages from the collaborators, which makes it easy to be attacked by malicious agents. For example, a malicious agent can send harmful information to the ego CAV to mislead it. To address this critical issue, we propose a novel method, **CP-Guard**, a tailored defense mechanism for CP that can be deployed by each agent to accurately detect and eliminate malicious agents in its collaboration network. Our key idea is that CP will lead to a consensus rather than a conflict against the ego CAV's perception results. Based on this idea, we first develop a probability-agnostic sample consensus (PASAC) method that can effectively sample a subset of the collaborators and verify the consensus without prior probabilities of malicious agents. Furthermore, we design a collaborative consistency loss (CCLoss) to calculate the discrepancy between the ego CAV and the collaborators, which is used as a verification criterion for consensus. Finally, we conduct extensive experiments in collaborative bird's eye view (BEV) tasks and the results demonstrate the effectiveness of our CP-Guard.

Subject: AAAI.2025 - Multiagent Systems


#11 Speedup Techniques for Switchable Temporal Plan Graph Optimization [PDF4] [Copy] [Kimi] [REL]

Authors: He Jiang, Muhan Lin, Jiaoyang Li

Multi-Agent Path Finding (MAPF) focuses on planning collision-free paths for multiple agents. However, during the execution of a MAPF plan, agents may encounter unexpected delays, which can lead to inefficiencies, deadlocks, or even collisions. To address these issues, the Switchable Temporal Plan Graph provides a framework for finding an acyclic Temporal Plan Graph with the minimum execution cost under delays, ensuring deadlock- and collision-free execution. Unfortunately, existing optimal algorithms, such as Mixed Integer Linear Programming and Graph-Based Switchable Edge Search (GSES), are often too slow for practical use. This paper introduces Improved GSES, which significantly accelerates GSES through four speedup techniques: stronger admissible heuristics, edge grouping, prioritized branching, and incremental implementation. Experiments conducted on four different map types with varying numbers of agents demonstrate that Improved GSES consistently achieves over twice the success rate of GSES and delivers up to a 30-fold speedup on instances where both methods successfully find solutions.

Subject: AAAI.2025 - Multiagent Systems


#12 An Open-Ended Learning Framework for Opponent Modeling [PDF5] [Copy] [Kimi1] [REL]

Authors: Yuheng Jing, Kai Li, Bingyun Liu, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng

Opponent Modeling (OM) aims to enhance decision-making by modeling other agents in multi-agent environments. Existing works typically learn opponent models against a pre-designated fixed set of opponents during training. However, this will cause poor generalization when facing unknown opponents during testing, as previously unseen opponents can exhibit out-of-distribution (OOD) behaviors that the learned opponent models cannot handle. To tackle this problem, we introduce a novel Open-Ended Opponent Modeling (OEOM) framework, which continuously generates opponents with diverse strengths and styles to reduce the possibility of OOD situations occurring during testing. Founded on population-based training and information-theoretic trajectory space diversity regularization, OEOM generates a dynamic set of opponents. This set is then fed to any OM approaches to train a potentially generalizable opponent model. Upon this, we further propose a simple yet effective OM approach that naturally fits within the OEOM framework. This approach is based on in-context reinforcement learning and learns a Transformer that dynamically recognizes and responds to opponents based on their trajectories. Extensive experiments in cooperative, competitive, and mixed environments demonstrate that OEOM is an approach-agnostic framework that improves generalizability compared to training against a fixed set of opponents, regardless of OM approaches or testing opponent settings. The results also indicate that our proposed approach generally outperforms existing OM baselines.

Subject: AAAI.2025 - Multiagent Systems


#13 Unsupervised Translation of Emergent Communication [PDF1] [Copy] [Kimi] [REL]

Authors: Ido Levy, Orr Paradise, Boaz Carmeli, Ron Meir, Shafi Goldwasser, Yonatan Belinkov

Emergent Communication (EC) provides a unique window into the language systems that emerge autonomously when agents are trained to jointly achieve shared goals. However, it is difficult to interpret EC and evaluate its relationship with natural languages (NL). This study employs unsupervised neural machine translation (UNMT) techniques to decipher ECs formed during referential games with varying task complexities, influenced by the semantic diversity of the environment. Our findings demonstrate UNMT's potential to translate EC, illustrating that task complexity characterized by semantic diversity enhances EC translatability, while higher task complexity with constrained semantic variability exhibits pragmatic EC, which, although challenging to interpret, remains suitable for translation. This research marks the first attempt, to our knowledge, to translate EC without the aid of parallel data.

Subject: AAAI.2025 - Multiagent Systems


#14 Efficient Communication in Multi-Agent Reinforcement Learning with Implicit Consensus Generation [PDF12] [Copy] [Kimi5] [REL]

Authors: Dapeng Li, Na Lou, Zhiwei Xu, Bin Zhang, Guoliang Fan

A key challenge in multi-agent collaborative tasks is reducing uncertainty about teammates to enhance cooperative performance. Explicit communication methods can reduce uncertainty about teammates, but the associated high communication costs limit their practicality. Alternatively, implicit consensus learning can promote cooperation without incurring communication costs. However, its performance declines significantly when local observations are severely limited. This paper introduces a novel multi-agent learning framework that combines the strengths of these methods. In our framework, agents generate a consensus about the group based on their local observations and then use both the consensus and local observations to produce messages. Since the consensus provides a certain level of global guidance, communication can be disabled when not essential, thereby reducing overhead. Meanwhile, communication can provide supplementary information to the consensus when necessary. Experimental results demonstrate that our algorithm significantly reduces inter-agent communication overhead while ensuring efficient collaboration.

Subject: AAAI.2025 - Multiagent Systems


#15 Learning Joint Behaviors with Large Variations [PDF5] [Copy] [Kimi1] [REL]

Authors: Tianxu Li, Kun Zhu

Cooperative Multi-Agent Reinforcement Learning (MARL) has drawn increasing interest in recent works due to its significant achievements. However, there are still some challenges impeding the learning of optimal cooperative policies, such as insufficient exploration. Prior works typically adopt mutual information-based methods to encourage exploration. However, this category of methods does not necessarily encourage agents to fully explore the joint behavior space. To address this limitation, we propose a novel objective based on learning a representation function with a Lipschitz constraint to maximize the traveled distances in the joint behavior space, encouraging agents to learn joint behaviors with large variations and leading to sufficient exploration. We further implement our method on top of QMIX. We demonstrate the effectiveness of our method by conducting experiments on the LBF, SMAC, and SMACv2 benchmarks. Our method outperforms previous methods in terms of final performance and state-action space exploration.

Subject: AAAI.2025 - Multiagent Systems


#16 Responsibility-aware Strategic Reasoning in Probabilistic Multi-Agent Systems [PDF6] [Copy] [Kimi4] [REL]

Authors: Chunyan Mu, Muhammad Najib, Nir Oren

Responsibility plays a key role in the development and deployment of trustworthy autonomous systems. In this paper, we focus on the problem of strategic reasoning in probabilistic multi-agent systems with responsibility-aware agents. We introduce the logic PATL+R, a variant of Probabilistic Alternating-time Temporal Logic. The novelty of PATL+R lies in its incorporation of modalities for causal responsibility, providing a framework for responsibility-aware multi-agent strategic reasoning. We present an approach to synthesise joint strategies that satisfy an outcome specified in PATL+R, while optimising the share of expected causal responsibility and reward. This provides a notion of balanced distribution of responsibility and reward gain among agents. To this end, we utilise the Nash equilibrium as the solution concept for our strategic reasoning problem and demonstrate how to compute responsibility-aware Nash equilibrium strategies via a reduction to parametric model checking of concurrent stochastic multi-player games.

Subject: AAAI.2025 - Multiagent Systems


#17 Explaining Decisions of Agents in Mixed-Motive Games [PDF3] [Copy] [Kimi3] [REL]

Authors: Maayan Orner, Oleg Maksimov, Akiva Kleinerman, Charles Ortiz, Sarit Kraus

In recent years, agents have become capable of communicating seamlessly via natural language and navigating in environments that involve cooperation and competition, a fact that can introduce social dilemmas. Due to the interleaving of cooperation and competition, understanding agents' decision-making in such environments is challenging, and humans can benefit from obtaining explanations. However, such environments and scenarios have rarely been explored in the context of explainable AI. While some explanation methods for cooperative environments can be applied in mixed-motive setups, they do not address inter-agent competition, cheap-talk, or implicit communication by actions. In this work, we design explanation methods to address these issues. Then, we proceed to establish generality and demonstrate the applicability of the methods to three games with vastly different properties. Lastly, we demonstrate the effectiveness and usefulness of the methods for humans in two mixed-motive games. The first is a challenging 7-player game called no-press Diplomacy. The second is a 3-player game inspired by the prisoner's dilemma, featuring communication in natural language.

Subject: AAAI.2025 - Multiagent Systems


#18 Optimally Solving Simultaneous-Move Dec-POMDPs: The Sequential Central Planning Approach [PDF2] [Copy] [Kimi1] [REL]

Authors: Johan Peralez, Aurélien Delage, Jacopo Castellini, Rafael F. Cunha, Jilles S. Dibangoye

The centralized training for decentralized execution paradigm emerged as the state-of-the-art approach to ϵ-optimally solving decentralized partially observable Markov decision processes. However, scalability remains a significant issue. This paper presents a novel and more scalable alternative, namely the sequential-move centralized training for decentralized execution. This paradigm further pushes the applicability of the Bellman’s principle of optimality, raising three new properties. First, it allows a central planner to reason upon sufficient sequential-move statistics instead of prior simultaneous-move ones. Next, it proves that ϵ-optimal value functions are piecewise linear and convex in such sufficient sequential-move statistics. Finally, it drops the complexity of the backup operators from double exponential to polynomial at the expense of longer planning horizons. Besides, it makes it easy to use single-agent methods, e.g., SARSA algorithm enhanced with these findings, while still preserving convergence guarantees. Experiments on two- as well as many-agent domains from the literature against ϵ-optimal simultaneous-move solvers confirm the superiority of our novel approach. This paradigm opens the door for efficient planning and reinforcement learning methods for multi-agent systems.

Subject: AAAI.2025 - Multiagent Systems


#19 Anytime Multi-Agent Path Finding with an Adaptive Delay-Based Heuristic [PDF2] [Copy] [Kimi] [REL]

Authors: Thomy Phan, Benran Zhang, Shao-Hung Chan, Sven Koenig

Anytime multi-agent path finding (MAPF) is a promising approach to scalable and collision-free path optimization in multi-agent systems. MAPF-LNS, based on Large Neighborhood Search (LNS), is the current state-of-the-art approach where a fast initial solution is iteratively optimized by destroying and repairing selected paths of the solution. Current MAPF-LNS variants commonly use an adaptive selection mechanism to choose among multiple destroy heuristics. However, to determine promising destroy heuristics, MAPF-LNS requires a considerable amount of exploration time. As common destroy heuristics are stationary, i.e., non-adaptive, any performance bottleneck caused by them cannot be overcome by adaptive heuristic selection alone, thus limiting the overall effectiveness of MAPF-LNS. In this paper, we propose Adaptive Delay-based Destroy-and-Repair Enhanced with Success-based Self-learning (ADDRESS) as a single-destroy-heuristic variant of MAPF-LNS. ADDRESS applies restricted Thompson Sampling to the top-K set of the most delayed agents to select a seed agent for adaptive LNS neighborhood generation. We evaluate ADDRESS in multiple maps from the MAPF benchmark set and demonstrate cost improvements by at least 50% in large-scale scenarios with up to a thousand agents, compared with the original MAPF-LNS and other state-of-the-art methods.

Subject: AAAI.2025 - Multiagent Systems


#20 REVECA: Adaptive Planning and Trajectory-Based Validation in Cooperative Language Agents Using Information Relevance and Relative Proximity [PDF2] [Copy] [Kimi1] [REL]

Authors: SeungWon Seo, SeongRae Noh, Junhyeok Lee, SooBin Lim, Won Hee Lee, HyeongYeop Kang

We address the challenge of multi-agent cooperation, where agents achieve a common goal by cooperating with decentralized agents under complex partial observations. Existing cooperative agent systems often struggle with efficiently processing continuously accumulating information, managing globally suboptimal planning due to lack of consideration of collaborators, and addressing false planning caused by environmental changes introduced by other collaborators. To overcome these challenges, we propose the RElevance, Proximity, and Validation-Enhanced Cooperative Language Agent (REVECA), a novel cognitive architecture powered by GPT-4o-mini. REVECA enables efficient memory management, optimal planning, and cost-effective prevention of false planning by leveraging Relevance Estimation, Adaptive Planning, and Trajectory-based Validation. Extensive experimental results demonstrate REVECA's superiority over existing methods across various benchmarks, while a user study reveals its potential for achieving trustworthy human-AI cooperation.

Subject: AAAI.2025 - Multiagent Systems


#21 CoDe: Communication Delay-Tolerant Multi-Agent Collaboration via Dual Alignment of Intent and Timeliness [PDF9] [Copy] [Kimi1] [REL]

Authors: Shoucheng Song, Youfang Lin, Sheng Han, Chang Yao, Hao Wu, Shuo Wang, Kai Lv

Communication has been widely employed to enhance multi-agent collaboration. Previous research has typically assumed delay-free communication, a strong assumption that is challenging to meet in practice. However, real-world agents suffer from channel delays, receiving messages sent at different time points, termed Asynchronous Communication, leading to cognitive biases and breakdowns in collaboration. This paper first defines two communication delay settings in MARL and emphasizes their harm to collaboration. To handle the above delays, this paper proposes a novel framework, Communication Delay-Tolerant Multi-Agent Collaboration (CoDe). At first, CoDe learns an intent representation as messages through future action inference, reflecting the stable future behavioral trends of the agents. Then, CoDe devises a dual alignment mechanism of intent and timeliness to strengthen the fusion process of asynchronous messages. In this way, agents can extract the long-term intent of others, even from delayed messages, and selectively utilize the most recent messages that are relevant to their intent. Experimental results demonstrate that CoDe outperforms baseline algorithms in three MARL benchmarks without delay and exhibits robustness under fixed and time-varying delays.

Subject: AAAI.2025 - Multiagent Systems


#22 A Multiagent Path Search Algorithm for Large-Scale Coalition Structure Generation [PDF3] [Copy] [Kimi2] [REL]

Authors: Redha Taguelmimt, Samir Aknine, Djamila Boukredera, Narayan Changder, Tuomas Sandholm

Coalition structure generation (CSG), i.e. the problem of optimally partitioning a set of agents into coalitions to maximize social welfare, is a fundamental computational problem in multiagent systems. This problem is important for many applications where small run times are necessary, including transportation and disaster response. In this paper, we develop SALDAE, a multiagent path finding algorithm for CSG that operates on a graph of coalition structures. Our algorithm utilizes a variety of heuristics and strategies to perform the search and guide it. It is an anytime algorithm that can handle large problems with hundreds and thousands of agents. We show empirically on nine standard value distributions, including disaster response and electric vehicle allocation benchmarks, that our algorithm enables a rapid finding of high-quality solutions and compares favorably with other state-of-the-art methods.

Subject: AAAI.2025 - Multiagent Systems


#23 Windowed MAPF with Completeness Guarantees [PDF1] [Copy] [Kimi] [REL]

Authors: Rishi Veerapaneni, Muhammad Suhail Saleem, Jiaoyang Li, Maxim Likhachev

Traditional multi-agent path finding (MAPF) methods try to compute entire collision free start-goal paths, with several algorithms offering completeness guarantees. However, computing partial paths offers significant advantages including faster planning, adaptability to changes, and enabling decentralized planning. Methods that compute partial paths employ a "windowed" approach and only try to find collision free paths for a limited timestep horizon. While this improves flexibility, this adaptation introduces incompleteness; all existing windowed approaches can become stuck in deadlock or livelock. Our main contribution is to introduce our framework, WinC-MAPF, for Windowed MAPF that enables completeness. Our framework leverages heuristic update insights from single-agent real-time heuristic search algorithms and agent independence ideas from MAPF algorithms. We also develop Single-Step Conflict Based Search (SS-CBS), an instantiation of this framework using a novel modification to CBS. We show how SS-CBS, which only plans a single step and updates heuristics, can effectively solve tough scenarios where existing windowed approaches fail.

Subject: AAAI.2025 - Multiagent Systems


#24 SrSv: Integrating Sequential Rollouts with Sequential Value Estimation for Multi-agent Reinforcement Learning [PDF3] [Copy] [Kimi1] [REL]

Authors: Xu Wan, Chao Yang, Cheng Yang, Jie Song, Mingyang Sun

Although multi-agent reinforcement learning (MARL) has shown its success across diverse domains, extending its application to large-scale real-world systems still faces significant challenges. Primarily, the high complexity of real-world environments exacerbates the credit assignment problem, substantially reducing training efficiency. Moreover, the variability of agent populations in large-scale scenarios necessitates scalable decision-making mechanisms. To address these challenges, we propose a novel framework: Sequential rollout with Sequential value estimation (SrSv). This framework aims to capture agent interdependence and provide a scalable solution for cooperative MARL. Specifically, SrSv leverages the autoregressive property of the Transformer model to handle varying populations through sequential action rollout. Furthermore, to capture the interdependence of policy distributions and value functions among multiple agents, we introduce an innovative sequential value estimation methodology and integrates the value approximation into an attention-based sequential model. We evaluate SrSv on three benchmarks: Multi-Agent MuJoCo, StarCraft Multi-Agent Challenge, and DubinsCars. Experimental results demonstrate that SrSv significantly outperforms baseline methods in terms of training efficiency without compromising convergence performance. Moreover, when implemented in a large-scale DubinsCar system with 1,024 agents, our framework surpasses existing benchmarks, highlighting the excellent scalability of SrSv.

Subject: AAAI.2025 - Multiagent Systems


#25 LNS2+RL: Combining Multi-agent Reinforcement Learning with Large Neighborhood Search in Multi-agent Path Finding [PDF7] [Copy] [Kimi1] [REL]

Authors: Yutong Wang, Tanishq Duhan, Jiaoyang Li, Guillaume Sartoretti

Multi-Agent Path Finding (MAPF) is a critical component of logistics and warehouse management, which focuses on planning collision-free paths for a team of robots in a known environment. Recent work introduced a novel MAPF approach, LNS2, which proposed to repair a quickly-obtainable set of infeasible paths via iterative re-planning, by relying on a fast, yet lower-quality, priority-based planner. At the same time, there has been a recent push for Multi-Agent Reinforcement Learning (MARL) based MAPF algorithms, which let agents learn decentralized policies that exhibit improved cooperation over such priority planning, although inevitably remaining slower. In this paper, we introduce a new MAPF algorithm, LNS2+RL, which combines the distinct yet complementary characteristics of LNS2 and MARL to effectively balance their individual limitations and get the best from both worlds. During early iterations, LNS2+RL relies on MARL for low-level re-planning, which we show eliminates collisions much more than a priority-based planner. There, our MARL-based planner allows agents to reason about past and future/predicted information to gradually learn cooperative decision-making through a finely designed curriculum learning. At later stages of planning, LNS2+RL adaptively switches to priority-based planning to quickly resolve the remaining collisions, naturally trading-off solution quality and computational efficiency. Our comprehensive experiments on challenging tasks across various team sizes, world sizes, and map structures consistently demonstrate the superior performance of LNS2+RL compared to many MAPF algorithms, including LNS2, LaCAM, and EECBS. In maps with complex structures, the advantages of LNS2+RL are particularly pronounced, with LNS2+RL achieving a success rate of over 50% in nearly half of the tested tasks, while that of LaCAM and EECBS falls to 0%.

Subject: AAAI.2025 - Multiagent Systems