AAAI.2021 - Multiagent Systems

| Total: 20

#1 Improving Continuous-time Conflict Based Search [PDF] [Copy] [Kimi] [REL]

Authors: Anton Andreychuk, Konstantin Yakovlev, Eli Boyarski, Roni Stern

Conflict-Based Search (CBS) is a powerful algorithmic framework for optimally solving classical multi-agent path finding (MAPF) problems, where time is discretized into the time steps. Continuous-time CBS (CCBS) is a recently proposed version of CBS that guarantees optimal solutions without the need to discretize time. However, the scalability of CCBS is limited because it does not include any known improvements of CBS. In this paper, we begin to close this gap and explore how to adapt successful CBS improvements, namely, prioritizing conflicts (PC), disjoint splitting (DS), and high-level heuristics, to the continuous time setting of CCBS. These adaptions are not trivial, and require careful handling of different types of constraints, applying a generalized version of the Safe interval path planning (SIPP) algorithm, and extending the notion of cardinal conflicts. We evaluate the effect of the suggested enhancements by running experiments both on general graphs and 2^k-neighborhood grids. CCBS with these improvements significantly outperforms vanilla CCBS, solving problems with almost twice as many agents in some cases and pushing the limits of multi-agent path finding in continuous-time domains.


#2 Inference-Based Deterministic Messaging For Multi-Agent Communication [PDF] [Copy] [Kimi] [REL]

Authors: Varun Bhatt, Michael Buro

Communication is essential for coordination among humans and animals. Therefore, with the introduction of intelligent agents into the world, agent-to-agent and agent-to-human communication becomes necessary. In this paper, we first study learning in matrix-based signaling games to empirically show that decentralized methods can converge to a suboptimal policy. We then propose a modification to the messaging policy, in which the sender deterministically chooses the best message that helps the receiver to infer the sender's observation. Using this modification, we see, empirically, that the agents converge to the optimal policy in nearly all the runs. We then apply this method to a partially observable gridworld environment which requires cooperation between two agents and show that, with appropriate approximation methods, the proposed sender modification can enhance existing decentralized training methods for more complex domains as well.


#3 Scalable and Safe Multi-Agent Motion Planning with Nonlinear Dynamics and Bounded Disturbances [PDF] [Copy] [Kimi] [REL]

Authors: Jingkai Chen, Jiaoyang Li, Chuchu Fan, Brian C. Williams

We present a scalable and effective multi-agent safe motion planner that enables a group of agents to move to their desired locations while avoiding collisions with obstacles and other agents, with the presence of rich obstacles, high-dimensional, nonlinear, nonholonomic dynamics, actuation limits, and disturbances. We address this problem by finding a piecewise linear path for each agent such that the actual trajectories following these paths are guaranteed to satisfy the reach-and-avoid requirement. We show that the spatial tracking error of the actual trajectories of the controlled agents can be pre-computed for any qualified path that considers the minimum duration of each path segment due to actuation limits. Using these bounds, we find a collision-free path for each agent by solving Mixed Integer-Linear Programs and coordinate agents by using the priority-based search. We demonstrate our method by benchmarking in 2D and 3D scenarios with ground vehicles and quadrotors, respectively, and show improvements over the solving time and the solution quality compared to two state-of-the-art multi-agent motion planners.


#4 Learning to Resolve Conflicts for Multi-Agent Path Finding with Conflict-Based Search [PDF] [Copy] [Kimi] [REL]

Authors: Taoan Huang, Sven Koenig, Bistra Dilkina

Conflict-Based Search (CBS) is a state-of-the-art algorithm for multi-agent path finding. On the high level, CBS repeatedly detects conflicts and resolves one of them by splitting the current problem into two subproblems. Previous work chooses the conflict to resolve by categorizing conflicts into three classes and always picking one from the highest-priority class. In this work, we propose an oracle for conflict selection that results in smaller search tree sizes than the one used in previous work. However, the computation of the oracle is slow. Thus, we propose a machine-learning (ML) framework for conflict selection that observes the decisions made by the oracle and learns a conflict-selection strategy represented by a linear ranking function that imitates the oracle's decisions accurately and quickly. Experiments on benchmark maps indicate that our approach, ML-guided CBS, significantly improves the success rates, search tree sizes and runtimes of the current state-of-the-art CBS solver.


#5 The Influence of Memory in Multi-Agent Consensus [PDF] [Copy] [Kimi] [REL]

Authors: David Kohan Marzagão, Luciana Basualdo Bonatto, Tiago Madeira, Marcelo Matheus Gauy, Peter McBurney

Multi-agent consensus problems can often be seen as a sequence of autonomous and independent local choices between a finite set of decision options, with each local choice undertaken simultaneously, and with a shared goal of achieving a global consensus state. Being able to estimate probabilities for the different outcomes and to predict how long it takes for a consensus to be formed, if ever, are core issues for such protocols. Little attention has been given to protocols in which agents can remember past or outdated states. In this paper, we propose a framework to study what we call `memory consensus protocol'. We show that the employment of memory allows such processes to always converge, as well as, in some scenarios, such as cycles, converge faster. We provide a theoretical analysis of the probability of each option eventually winning such processes based on the initial opinions expressed by agents. Further, we perform experiments to investigate network topologies in which agents benefit from memory on the expected time needed for consensus.


#6 Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory [PDF] [Copy] [Kimi] [REL]

Authors: Stefanos Leonardos, Georgios Piliouras

Exploration-exploitation is a powerful and practical tool in multi-agent learning (MAL), however, its effects are far from understood. To make progress in this direction, we study a smooth analogue of Q-learning. We start by showing that our learning model has strong theoretical justification as an optimal model for studying exploration-exploitation. Specifically, we prove that smooth Q-learning has bounded regret in arbitrary games for a cost model that explicitly captures the balance between game and exploration costs and that it always converges to the set of quantal-response equilibria (QRE), the standard solution concept for games under bounded rationality, in weighted potential games with heterogeneous learning agents. In our main task, we then turn to measure the effect of exploration in collective system performance. We characterize the geometry of the QRE surface in low-dimensional MAL systems and link our findings with catastrophe (bifurcation) theory. In particular, as the exploration hyperparameter evolves over-time, the system undergoes phase transitions where the number and stability of equilibria can change radically given an infinitesimal change to the exploration parameter. Based on this, we provide a formal theoretical treatment of how tuning the exploration parameter can provably lead to equilibrium selection with both positive as well as negative (and potentially unbounded) effects to system performance.


#7 Lifelong Multi-Agent Path Finding in Large-Scale Warehouses [PDF] [Copy] [Kimi] [REL]

Authors: Jiaoyang Li, Andrew Tinka, Scott Kiesel, Joseph W. Durham, T. K. Satish Kumar, Sven Koenig

Multi-Agent Path Finding (MAPF) is the problem of moving a team of agents to their goal locations without collisions. In this paper, we study the lifelong variant of MAPF, where agents are constantly engaged with new goal locations, such as in large-scale automated warehouses. We propose a new framework Rolling-Horizon Collision Resolution (RHCR) for solving lifelong MAPF by decomposing the problem into a sequence of Windowed MAPF instances, where a Windowed MAPF solver resolves collisions among the paths of the agents only within a bounded time horizon and ignores collisions beyond it. RHCR is particularly well suited to generating pliable plans that adapt to continually arriving new goal locations. We empirically evaluate RHCR with a variety of MAPF solvers and show that it can produce high-quality solutions for up to 1,000 agents (= 38.9% of the empty cells on the map) for simulated warehouse instances, significantly outperforming existing work.


#8 Dec-SGTS: Decentralized Sub-Goal Tree Search for Multi-Agent Coordination [PDF] [Copy] [Kimi] [REL]

Authors: Minglong Li, Zhongxuan Cai, Wenjing Yang, Lixia Wu, Yinghui Xu, Ji Wang

Multi-agent coordination tends to benefit from efficient communication, where cooperation often happens based on exchanging information about what the agents intend to do, i.e. intention sharing. It becomes a key problem to model the intention by some proper abstraction. Currently, it is either too coarse such as final goals or too fined as primitive steps, which is inefficient due to the lack of modularity and semantics. In this paper, we design a novel multi-agent coordination protocol based on subgoal intentions, defined as the probability distribution over feasible subgoal sequences. The subgoal intentions encode macro-action behaviors with modularity so as to facilitate joint decision making at higher abstraction. Built over the proposed protocol, we present Dec-SGTS (Decentralized Sub-Goal Tree Search) to solve decentralized online multi-agent planning hierarchically and efficiently. Each agent runs Dec-SGTS asynchronously by iteratively performing three phases including local sub-goal tree search, local subgoal intention update and global subgoal intention sharing. We conduct the experiments on courier dispatching problem, and the results show that Dec-SGTS achieves much better reward while enjoying a significant reduction of planning time and communication cost compared with Dec-MCTS (Decentralized Monte Carlo Tree Search).


#9 Expected Value of Communication for Planning in Ad Hoc Teamwork [PDF] [Copy] [Kimi] [REL]

Authors: William Macke, Reuth Mirsky, Peter Stone

A desirable goal for autonomous agents is to be able to coordinate on the fly with previously unknown teammates. Known as “ad hoc teamwork”, enabling such a capability has been receiving increasing attention in the research community. One of the central challenges in ad hoc teamwork is quickly recognizing the current plans of other agents and planning accordingly. In this paper, we focus on the scenario in which teammates can communicate with one another, but only at a cost. Thus, they must carefully balance plan recognition based on observations vs. that based on communication. This paper proposes a new metric for evaluating how similar are two policies that a teammate may be following - the Expected Divergence Point (EDP). We then present a novel planning algorithm for ad hoc teamwork, determining which query to ask and planning accordingly. We demonstrate the effectiveness of this algorithm in a range of increasingly general communication in ad hoc teamwork problems.


#10 Time-Independent Planning for Multiple Moving Agents [PDF] [Copy] [Kimi] [REL]

Authors: Keisuke Okumura, Yasumasa Tamura, Xavier Défago

Typical Multi-agent Path Finding (MAPF) solvers assume that agents move synchronously, thus neglecting the reality gap in timing assumptions, e.g., delays caused by an imperfect execution of asynchronous moves. So far, two policies enforce a robust execution of MAPF plans taken as input: either by forcing agents to synchronize or by executing plans while preserving temporal dependencies. This paper proposes an alternative approach, called time-independent planning, which is both online and distributed. We represent reality as a transition system that changes configurations according to atomic actions of agents, and use it to generate a time-independent schedule. Empirical results in a simulated environment with stochastic delays of agents' moves support the validity of our proposal.


#11 Resilient Multi-Agent Reinforcement Learning with Adversarial Value Decomposition [PDF] [Copy] [Kimi] [REL]

Authors: Thomy Phan, Lenz Belzner, Thomas Gabor, Andreas Sedlmeier, Fabian Ritz, Claudia Linnhoff-Popien

We focus on resilience in cooperative multi-agent systems, where agents can change their behavior due to udpates or failures of hardware and software components. Current state-of-the-art approaches to cooperative multi-agent reinforcement learning (MARL) have either focused on idealized settings without any changes or on very specialized scenarios, where the number of changing agents is fixed, e.g., in extreme cases with only one productive agent. Therefore, we propose Resilient Adversarial value Decomposition with Antagonist-Ratios (RADAR). RADAR offers a value decomposition scheme to train competing teams of varying size for improved resilience against arbitrary agent changes. We evaluate RADAR in two cooperative multi-agent domains and show that RADAR achieves better worst case performance w.r.t. arbitrary agent changes than state-of-the-art MARL.


#12 Anytime Heuristic and Monte Carlo Methods for Large-Scale Simultaneous Coalition Structure Generation and Assignment [PDF] [Copy] [Kimi] [REL]

Authors: Fredrik Präntare, Herman Appelgren, Fredrik Heintz

Optimal simultaneous coalition structure generation and assignment is computationally hard. The state-of-the-art can only compute solutions to problems with severely limited input sizes, and no effective approximation algorithms that are guaranteed to yield high-quality solutions are expected to exist. Real-world optimization problems, however, are often characterized by large-scale inputs and the need for generating feasible solutions of high quality in limited time. In light of this, and to make it possible to generate better feasible solutions for difficult large-scale problems efficiently, we present and benchmark several different anytime algorithms that use general-purpose heuristics and Monte Carlo techniques to guide search. We evaluate our methods using synthetic problem sets of varying distribution and complexity. Our results show that the presented algorithms are superior to previous methods at quickly generating near-optimal solutions for small-scale problems, and greatly superior for efficiently finding high-quality solutions for large-scale problems. For example, for problems with a thousand agents and values generated with a uniform distribution, our best approach generates solutions 99.5% of the expected optimal within seconds. For these problems, the state-of-the-art solvers fail to find any feasible solutions at all.


#13 Newton Optimization on Helmholtz Decomposition for Continuous Games [PDF] [Copy] [Kimi] [REL]

Authors: Giorgia Ramponi, Marcello Restelli

Many learning problems involve multiple agents optimizing different interactive functions. In these problems, the standard policy gradient algorithms fail due to the non-stationarity of the setting and the different interests of each agent. In fact, algorithms must take into account the complex dynamics of these systems to guarantee rapid convergence towards a (local) Nash equilibrium. In this paper, we propose NOHD (Newton Optimization on Helmholtz Decomposition), a Newton-like algorithm for multi-agent learning problems based on the decomposition of the dynamics of the system in its irrotational (Potential) and solenoidal (Hamiltonian) component. This method ensures quadratic convergence in purely irrotational systems and pure solenoidal systems. Furthermore, we show that NOHD is attracted to stable fixed points in general multi-agent systems and repelled by strict saddle ones. Finally, we empirically compare the NOHD's performance with that of state-of-the-art algorithms on some bimatrix games and continuous Gridworlds environment.


#14 Synchronous Dynamical Systems on Directed Acyclic Graphs: Complexity and Algorithms [PDF] [Copy] [Kimi] [REL]

Authors: Daniel J. Rosenkrantz, Madhav Marathe, S. S. Ravi, Richard E. Stearns

Discrete dynamical systems serve as useful formal models to study diffusion phenomena in social networks. Motivated by applications in systems biology, several recent papers have studied algorithmic and complexity aspects of diffusion problems for dynamical systems whose underlying graphs are directed, and may contain directed cycles. Such problems can be regarded as reachability problems in the phase space of the corresponding dynamical system. We show that computational intractability results for reachability problems hold even for dynamical systems on directed acyclic graphs (dags). We also show that for dynamical systems on dags where each local function is monotone, the reachability problem can be solved efficiently.


#15 Evolutionary Game Theory Squared: Evolving Agents in Endogenously Evolving Zero-Sum Games [PDF1] [Copy] [Kimi1] [REL]

Authors: Stratis Skoulakis, Tanner Fiez, Ryann Sim, Georgios Piliouras, Lillian Ratliff

The predominant paradigm in evolutionary game theory and more generally online learning in games is based on a clear distinction between a population of dynamic agents that interact given a fixed, static game. In this paper, we move away from the artificial divide between dynamic agents and static games, to introduce and analyze a large class of competitive settings where both the agents and the games they play evolve strategically over time. We focus on arguably the most archetypal game-theoretic setting---zero-sum games (as well as network generalizations)---and the most studied evolutionary learning dynamic---replicator, the continuous-time analogue of multiplicative weights. Populations of agents compete against each other in a zero-sum competition that itself evolves adversarially to the current population mixture. Remarkably, despite the chaotic coevolution of agents and games, we prove that the system exhibits a number of regularities. First, the system has conservation laws of an information-theoretic flavor that couple the behavior of all agents and games. Secondly, the system is Poincare recurrent, with effectively all possible initializations of agents and games lying on recurrent orbits that come arbitrarily close to their initial conditions infinitely often. Thirdly, the time-average agent behavior and utility converge to the Nash equilibrium values of the time-average game. Finally, we provide a polynomial time algorithm to efficiently predict this time-average behavior for any such coevolving network game.


#16 Value-Decomposition Multi-Agent Actor-Critics [PDF] [Copy] [Kimi] [REL]

Authors: Jianyu Su, Stephen Adams, Peter Beling

The exploitation of extra state information has been an active research area in multi-agent reinforcement learning (MARL). QMIX represents the joint action-value using a non-negative function approximator and achieves the best performance on the StarCraft II micromanagement testbed, a common MARL benchmark. However, our experiments demonstrate that, in some cases, QMIX performs sub-optimally with the A2C framework, a training paradigm that promotes algorithm training efficiency. To obtain a reasonable trade-off between training efficiency and algorithm performance, we extend value-decomposition to actor-critic methods that are compatible with A2C and propose a novel actor-critic framework, value-decomposition actor-critic (VDAC). We evaluate VDAC on the StarCraft II micromanagement task and demonstrate that the proposed framework improves median performance over other actor-critic methods. Furthermore, we use a set of ablation experiments to identify the key factors that contribute to the performance of VDAC.


#17 Contract-based Inter-user Usage Coordination in Free-floating Car Sharing [PDF] [Copy] [Kimi] [REL]

Authors: Kentaro Takahira, Shigeo Matsubara

We propose a novel distributed user-car matching method based on a contract between users to mitigate the imbalance problem between vehicle distribution and demand in free-floating car sharing. Previous regulation methods involved an incentive system based on the predictions of origin-destination (OD) demand obtained from past usage history. However, the difficulty these methods have in obtaining accurate data limits their applicability. To overcome this drawback, we introduce contract-based coordination among drop-off and pick-up users in which an auction is conducted for drop-off users' intended drop-off locations. We theoretically analyze the proposed method regarding the upper bound of its efficiency. We also compare it with a baseline method and non-regulation scenario on a free-floating car-sharing simulator. The experimental results show that the proposed method achieves a higher social surplus than the existing method.


#18 Maintenance of Social Commitments in Multiagent Systems [PDF] [Copy] [Kimi] [REL]

Authors: Pankaj Telang, Munindar P. Singh, Neil Yorke-Smith

We introduce and formalize a concept of a maintenance commitment, a kind of social commitment characterized by states whose truthhood an agent commits to maintain. This concept of maintenance commitments enables us to capture a richer variety of real-world scenarios than possible using achievement commitments with a temporal condition. By developing a rule-based operational semantics, we study the relationship between agents' achievement and maintenance goals, achievement commitments, and maintenance commitments. We motivate a notion of coherence which captures alignment between an agents' achievement and maintenance cognitive and social constructs, and prove that, under specified conditions, the goals and commitments of both rational agents individually and of a multiagent system are coherent.


#19 Efficient Querying for Cooperative Probabilistic Commitments [PDF] [Copy] [Kimi] [REL]

Authors: Qi Zhang, Edmund H. Durfee, Satinder Singh

Multiagent systems can use commitments as the core of a general coordination infrastructure, supporting both cooperative and non-cooperative interactions. Agents whose objectives are aligned, and where one agent can help another achieve greater reward by sacrificing some of its own reward, should choose a cooperative commitment to maximize their joint reward. We present a solution to the problem of how cooperative agents can efficiently find an (approximately) optimal commitment by querying about carefully-selected commitment choices. We prove structural properties of the agents' values as functions of the parameters of the commitment specification, and develop a greedy method for composing a query with provable approximation bounds, which we empirically show can find nearly optimal commitments in a fraction of the time methods that lack our insights require.


#20 Coordination Between Individual Agents in Multi-Agent Reinforcement Learning [PDF] [Copy] [Kimi] [REL]

Authors: Yang Zhang, Qingyu Yang, Dou An, Chengwei Zhang

The existing multi-agent reinforcement learning methods (MARL) for determining the coordination between agents focus on either global-level or neighborhood-level coordination between agents. However the problem of coordination between individual agents is remain to be solved. It is crucial for learning an optimal coordinated policy in unknown multi-agent environments to analyze the agent's roles and the correlation between individual agents. To this end, in this paper we propose an agent-level coordination based MARL method. Specifically, it includes two parts in our method. The first is correlation analysis between individual agents based on the Pearson, Spearman, and Kendall correlation coefficients; And the second is an agent-level coordinated training framework where the communication message between weakly correlated agents is dropped out, and a correlation based reward function is built. The proposed method is verified in four mixed cooperative-competitive environments. The experimental results show that the proposed method outperforms the state-of-the-art MARL methods and can measure the correlation between individual agents accurately.