Multiagent Systems

#1 TorchDriveEnv: A Reinforcement Learning Benchmark for Autonomous Driving with Reactive, Realistic, and Diverse Non-Playable Characters [PDF⁶] [Copy] [Kimi²⁵]

Authors: Jonathan Wilder Lavington ; Ke Zhang ; Vasileios Lioutas ; Matthew Niedoba ; Yunpeng Liu ; Dylan Green ; Saeid Naderiparizi ; Xiaoxuan Liang ; Setareh Dabiri ; Adam Ścibior ; Berend Zwartsenberg ; Frank Wood

The training, testing, and deployment, of autonomous vehicles requires realistic and efficient simulators. Moreover, because of the high variability between different problems presented in different autonomous systems, these simulators need to be easy to use, and easy to modify. To address these problems we introduce TorchDriveSim and its benchmark extension TorchDriveEnv. TorchDriveEnv is a lightweight reinforcement learning benchmark programmed entirely in Python, which can be modified to test a number of different factors in learned vehicle behavior, including the effect of varying kinematic models, agent types, and traffic control patterns. Most importantly unlike many replay based simulation approaches, TorchDriveEnv is fully integrated with a state of the art behavioral simulation API. This allows users to train and evaluate driving models alongside data driven Non-Playable Characters (NPC) whose initializations and driving behavior are reactive, realistic, and diverse. We illustrate the efficiency and simplicity of TorchDriveEnv by evaluating common reinforcement learning baselines in both training and validation environments. Our experiments show that TorchDriveEnv is easy to use, but difficult to solve.

#2 Iterative Experience Refinement of Software-Developing Agents [PDF³] [Copy] [Kimi⁴]

Authors: Chen Qian ; Jiahao Li ; Yufan Dang ; Wei Liu ; YiFei Wang ; Zihao Xie ; Weize Chen ; Cheng Yang ; Yingli Zhang ; Zhiyuan Liu ; Maosong Sun

Autonomous agents powered by large language models (LLMs) show significant potential for achieving high autonomy in various scenarios such as software development. Recent research has shown that LLM agents can leverage past experiences to reduce errors and enhance efficiency. However, the static experience paradigm, reliant on a fixed collection of past experiences acquired heuristically, lacks iterative refinement and thus hampers agents' adaptability. In this paper, we introduce the Iterative Experience Refinement framework, enabling LLM agents to refine experiences iteratively during task execution. We propose two fundamental patterns: the successive pattern, refining based on nearest experiences within a task batch, and the cumulative pattern, acquiring experiences across all previous task batches. Augmented with our heuristic experience elimination, the method prioritizes high-quality and frequently-used experiences, effectively managing the experience space and enhancing efficiency. Extensive experiments show that while the successive pattern may yield superior results, the cumulative pattern provides more stable performance. Moreover, experience elimination facilitates achieving better performance using just 11.54% of a high-quality subset.

#3 Unified End-to-End V2X Cooperative Autonomous Driving [PDF³] [Copy] [Kimi]

Authors: Zhiwei Li ; Bozhen Zhang ; Lei Yang ; Tianyu Shen ; Nuo Xu ; Ruosen Hao ; Weiting Li ; Tao Yan ; Huaping Liu

V2X cooperation, through the integration of sensor data from both vehicles and infrastructure, is considered a pivotal approach to advancing autonomous driving technology. Current research primarily focuses on enhancing perception accuracy, often overlooking the systematic improvement of accident prediction accuracy through end-to-end learning, leading to insufficient attention to the safety issues of autonomous driving. To address this challenge, this paper introduces the UniE2EV2X framework, a V2X-integrated end-to-end autonomous driving system that consolidates key driving modules within a unified network. The framework employs a deformable attention-based data fusion strategy, effectively facilitating cooperation between vehicles and infrastructure. The main advantages include: 1) significantly enhancing agents' perception and motion prediction capabilities, thereby improving the accuracy of accident predictions; 2) ensuring high reliability in the data fusion process; 3) superior end-to-end perception compared to modular approaches. Furthermore, We implement the UniE2EV2X framework on the challenging DeepAccident, a simulation dataset designed for V2X cooperative driving.

#4 Select to Perfect: Imitating desired behavior from large multi-agent data [PDF¹] [Copy] [Kimi¹]

Authors: Tim Franzmeyer ; Edith Elkind ; Philip Torr ; Jakob Foerster ; Joao Henriques

AI agents are commonly trained with large datasets of demonstrations of human behavior. However, not all behaviors are equally safe or desirable. Desired characteristics for an AI agent can be expressed by assigning desirability scores, which we assume are not assigned to individual behaviors but to collective trajectories. For example, in a dataset of vehicle interactions, these scores might relate to the number of incidents that occurred. We first assess the effect of each individual agent's behavior on the collective desirability score, e.g., assessing how likely an agent is to cause incidents. This allows us to selectively imitate agents with a positive effect, e.g., only imitating agents that are unlikely to cause incidents. To enable this, we propose the concept of an agent's Exchange Value, which quantifies an individual agent's contribution to the collective desirability score. The Exchange Value is the expected change in desirability score when substituting the agent for a randomly selected agent. We propose additional methods for estimating Exchange Values from real-world datasets, enabling us to learn desired imitation policies that outperform relevant baselines. The project website can be found at https://tinyurl.com/select-to-perfect.

#5 A Single Online Agent Can Efficiently Learn Mean Field Games [PDF] [Copy] [Kimi]

Authors: Chenyu Zhang ; Xu Chen ; Xuan Di

Mean field games (MFGs) are a promising framework for modeling the behavior of large-population systems. However, solving MFGs can be challenging due to the coupling of forward population evolution and backward agent dynamics. Typically, obtaining mean field Nash equilibria (MFNE) involves an iterative approach where the forward and backward processes are solved alternately, known as fixed-point iteration (FPI). This method requires fully observed population propagation and agent dynamics over the entire spatial domain, which could be impractical in some real-world scenarios. To overcome this limitation, this paper introduces a novel online single-agent model-free learning scheme, which enables a single agent to learn MFNE using online samples, without prior knowledge of the state-action space, reward function, or transition dynamics. Specifically, the agent updates its policy through the value function (Q), while simultaneously evaluating the mean field state (M), using the same batch of observations. We develop two variants of this learning scheme: off-policy and on-policy QM iteration. We prove that they efficiently approximate FPI, and a sample complexity guarantee is provided. The efficacy of our methods is confirmed by numerical experiments.

#6 A Guide to Re-Implementing Agent-based Models: Experiences from the HUMAT Model [PDF] [Copy] [Kimi]

Authors: Önder Gürcan ; Timo Szczepanska ; Patrycja Antosz

Replicating existing agent-based models poses significant challenges, particularly for those new to the field. This article presents an all- encompassing guide to re-implementing agent-based models, encompassing vital concepts such as comprehending the original model, utilizing agent-based modeling frameworks, simulation design, model validation, and more. By embracing the proposed guide, researchers and practitioners can gain a profound understanding of the entire re-implementation process, resulting in heightened accuracy and reliability of simulations for complex systems. Furthermore, this article showcases the re-implementation of the HUMAT socio-cognitive architecture, with a specific focus on designing a versatile, language-independent model. The encountered challenges and pitfalls in the re-implementation process are thoroughly discussed, empowering readers with practical insights. Embrace this guide to expedite model development while ensuring robust and precise simulations.

#1 TorchDriveEnv: A Reinforcement Learning Benchmark for Autonomous Driving with Reactive, Realistic, and Diverse Non-Playable Characters [PDF6] [Copy] [Kimi25]

#2 Iterative Experience Refinement of Software-Developing Agents [PDF3] [Copy] [Kimi4]