AAAI.2019 - Multiagent Systems

| Total: 22

#1 Verification of RNN-Based Neural Agent-Environment Systems [PDF] [Copy] [Kimi] [REL]

Authors: Michael E. Akintunde, Andreea Kevorchian, Alessio Lomuscio, Edoardo Pirovano

We introduce agent-environment systems where the agent is stateful and executing a ReLU recurrent neural network. We define and study their verification problem by providing equivalences of recurrent and feed-forward neural networks on bounded execution traces. We give a sound and complete procedure for their verification against properties specified in a simplified version of LTL on bounded executions. We present an implementation and discuss the experimental results obtained.


#2 Bayesian Execution Skill Estimation [PDF] [Copy] [Kimi] [REL]

Authors: Christopher Archibald, Delma Nieves-Rivera

The performance of agents in many domains with continuous action spaces depends not only on their ability to select good actions to execute, but also on their ability to execute planned actions precisely. This ability, which has been called an agent’s execution skill, is an important characteristic of an agent which can have a significant impact on their success. In this paper, we address the problem of estimating the execution skill of an agent given observations of that agent acting in a domain. Each observation includes the executed action and a description of the state in which the action was executed and the reward received, but notably excludes the action that the agent intended to execute. We previously introduced this problem and demonstrated that estimating an agent’s execution skill is possible under certain conditions. Our previous method focused entirely on the reward that the agent received from executed actions and assumed that the agent was able to select the optimal action for each state. This paper addresses the execution skill estimation problem from an entirely different perspective, focusing instead on the action that was executed. We present a Bayesian framework for reasoning about action observations and show that it is able to outperform previous methods under the same conditions. We also show that the flexibility of this framework allows it to be applied in settings where the previous limiting assumptions are not met. The success of the proposed method is demonstrated experimentally in a toy domain as well as the domain of computational billiards.


#3 Consensus in Opinion Formation Processes in Fully Evolving Environments [PDF] [Copy] [Kimi] [REL]

Authors: Vincenzo Auletta, Angelo Fanelli, Diodato Ferraioli

Friedkin and Johnsen (1990) modeled opinion formation in social networks as a dynamic process which evolves in rounds: at each round each agent updates her expressed opinion to a weighted average of her innate belief and the opinions expressed in the previous round by her social neighbors. The stubbornness level of an agent represents the tendency of the agent to express an opinion close to her innate belief. Motivated by the observation that innate beliefs, stubbornness levels and even social relations can co-evolve together with the expressed opinions, we present a new model of opinion formation where the dynamics runs in a co-evolving environment. We assume that agents’ stubbornness and social relations can vary arbitrarily, while their innate beliefs slowly change as a function of the opinions they expressed in the past. We prove that, in our model, the opinion formation dynamics converges to a consensus if reasonable conditions on the structure of the social relationships and on how the personal beliefs can change are satisfied. Moreover, we discuss how this result applies in several simpler (but realistic) settings.


#4 An Abstraction-Based Method for Verifying Strategic Properties in Multi-Agent Systems with Imperfect Information [PDF] [Copy] [Kimi] [REL]

Authors: Francesco Belardinelli, Alessio Lomuscio, Vadim Malvone

We investigate the verification of Multi-agent Systems against strategic properties expressed in Alternating-time Temporal Logic under the assumptions of imperfect information and perfect recall. To this end, we develop a three-valued semantics for concurrent game structures upon which we define an abstraction method. We prove that concurrent game structures with imperfect information admit perfect information abstractions that preserve three-valued satisfaction. Further, we present a refinement procedure to deal with cases where the value of a specification is undefined. We illustrate the overall procedure in a variant of the Train Gate Controller scenario under imperfect information and perfect recall.


#5 A Generic Approach to Accelerating Belief Propagation Based Incomplete Algorithms for DCOPs via a Branch-and-Bound Technique [PDF] [Copy] [Kimi] [REL]

Authors: Ziyu Chen, Xingqiong Jiang, Yanchen Deng, Dingding Chen, Zhongshi He

Belief propagation approaches, such as Max-Sum and its variants, are important methods to solve large-scale Distributed Constraint Optimization Problems (DCOPs). However, for problems with n-ary constraints, these algorithms face a huge challenge since their computational complexity scales exponentially with the number of variables a function holds. In this paper, we present a generic and easy-touse method based on a branch-and-bound technique to solve the issue, called Function Decomposing and State Pruning (FDSP). We theoretically prove that FDSP can provide monotonically non-increasing upper bounds and speed up belief propagation based incomplete DCOP algorithms without an effect on solution quality. Also, our empirically evaluation indicates that FDSP can reduce 97% of the search space at least and effectively accelerate Max-Sum, compared with the state-of-the-art.


#6 Distributed Community Detection via Metastability of the 2-Choices Dynamics [PDF] [Copy] [Kimi] [REL]

Authors: Emilio Cruciani, Emanuele Natale, Giacomo Scornavacca

We investigate the behavior of a simple majority dynamics on networks of agents whose interaction topology exhibits a community structure. By leveraging recent advancements in the analysis of dynamics, we prove that, when the states of the nodes are randomly initialized, the system rapidly and stably converges to a configuration in which the communities maintain internal consensus on different states. This is the first analytical result on the behavior of dynamics for nonconsensus problems on non-complete topologies, based on the first symmetry-breaking analysis in such setting. Our result has several implications in different contexts in which dynamics are adopted for computational and biological modeling purposes. In the context of Label Propagation Algorithms, a class of widely used heuristics for community detection, it represents the first theoretical result on the behavior of a distributed label propagation algorithm with quasi-linear message complexity. In the context of evolutionary biology, dynamics such as the Moran process have been used to model the spread of mutations in genetic populations (Lieberman, Hauert, and Nowak 2005); our result shows that, when the probability of adoption of a given mutation by a node of the evolutionary graph depends super-linearly on the frequency of the mutation in the neighborhood of the node and the underlying evolutionary graph exhibits a community structure, there is a non-negligible probability for species differentiation to occur.


#7 Successor Features Based Multi-Agent RL for Event-Based Decentralized MDPs [PDF] [Copy] [Kimi] [REL]

Authors: Tarun Gupta, Akshat Kumar, Praveen Paruchuri

Decentralized MDPs (Dec-MDPs) provide a rigorous framework for collaborative multi-agent sequential decisionmaking under uncertainty. However, their computational complexity limits the practical impact. To address this, we focus on a class of Dec-MDPs consisting of independent collaborating agents that are tied together through a global reward function that depends upon their entire histories of states and actions to accomplish joint tasks. To overcome scalability barrier, our main contributions are: (a) We propose a new actor-critic based Reinforcement Learning (RL) approach for event-based Dec-MDPs using successor features (SF) which is a value function representation that decouples the dynamics of the environment from the rewards; (b) We then present Dec-ESR (Decentralized Event based Successor Representation) which generalizes learning for event-based Dec-MDPs using SF within an end-to-end deep RL framework; (c) We also show that Dec-ESR allows useful transfer of information on related but different tasks, hence bootstraps the learning for faster convergence on new tasks; (d) For validation purposes, we test our approach on a large multi-agent coverage problem which models schedule coordination of agents in a real urban subway network and achieves better quality solutions than previous best approaches.


#8 IPOMDP-Net: A Deep Neural Network for Partially Observable Multi-Agent Planning Using Interactive POMDPs [PDF] [Copy] [Kimi] [REL]

Authors: Yanlin Han, Piotr Gmytrasiewicz

This paper introduces the IPOMDP-net, a neural network architecture for multi-agent planning under partial observability. It embeds an interactive partially observable Markov decision process (I-POMDP) model and a QMDP planning algorithm that solves the model in a neural network architecture. The IPOMDP-net is fully differentiable and allows for end-to-end training. In the learning phase, we train an IPOMDP-net on various fixed and randomly generated environments in a reinforcement learning setting, assuming observable reinforcements and unknown (randomly initialized) model functions. In the planning phase, we test the trained network on new, unseen variants of the environments under the planning setting, using the trained model to plan without reinforcements. Empirical results show that our model-based IPOMDP-net outperforms the other state-of-the-art modelfree network and generalizes better to larger, unseen environments. Our approach provides a general neural computing architecture for multi-agent planning using I-POMDPs. It suggests that, in a multi-agent setting, having a model of other agents benefits our decision-making, resulting in a policy of higher quality and better generalizability.


#9 General Robustness Evaluation of Incentive Mechanism against Bounded Rationality Using Continuum-Armed Bandits [PDF] [Copy] [Kimi] [REL]

Authors: Zehong Hu, Jie Zhang, Zhao Li

Incentive mechanisms that assume agents to be fully rational, may fail due to the bounded rationality of agents in practice. It is thus crucial to evaluate to what extent mechanisms can resist agents’ bounded rationality, termed robustness. In this paper, we propose a general empirical framework for robustness evaluation. One novelty of our framework is to develop a robustness formulation that is generally applicable to different types of incentive mechanisms and bounded rationality models. This formulation considers not only the incentives to agents but also the performance of mechanisms. The other novelty lies in converting the empirical robustness computation into a continuum-armed bandit problem, and then developing an efficient solver that has theoretically guaranteed error rate upper bound. We also conduct extensive experiments using various mechanisms to verify the advantages and practicability of our robustness evaluation framework.


#10 Message-Dropout: An Efficient Training Method for Multi-Agent Deep Reinforcement Learning [PDF] [Copy] [Kimi] [REL]

Authors: Woojun Kim, Myungsik Cho, Youngchul Sung

In this paper, we propose a new learning technique named message-dropout to improve the performance for multi-agent deep reinforcement learning under two application scenarios: 1) classical multi-agent reinforcement learning with direct message communication among agents and 2) centralized training with decentralized execution. In the first application scenario of multi-agent systems in which direct message communication among agents is allowed, the messagedropout technique drops out the received messages from other agents in a block-wise manner with a certain probability in the training phase and compensates for this effect by multiplying the weights of the dropped-out block units with a correction probability. The applied message-dropout technique effectively handles the increased input dimension in multi-agent reinforcement learning with communication and makes learning robust against communication errors in the execution phase. In the second application scenario of centralized training with decentralized execution, we particularly consider the application of the proposed messagedropout to Multi-Agent Deep Deterministic Policy Gradient (MADDPG), which uses a centralized critic to train a decentralized actor for each agent. We evaluate the proposed message-dropout technique for several games, and numerical results show that the proposed message-dropout technique with proper dropout rate improves the reinforcement learning performance significantly in terms of the training speed and the steady-state performance in the execution phase.


#11 Symmetry-Breaking Constraints for Grid-Based Multi-Agent Path Finding [PDF] [Copy] [Kimi] [REL]

Authors: Jiaoyang Li, Daniel Harabor, Peter J. Stuckey, Hang Ma, Sven Koenig

We describe a new way of reasoning about symmetric collisions for Multi-Agent Path Finding (MAPF) on 4-neighbor grids. We also introduce a symmetry-breaking constraint to resolve these conflicts. This specialized technique allows us to identify and eliminate, in a single step, all permutations of two currently assigned but incompatible paths. Each such permutation has exactly the same cost as a current path, and each one results in a new collision between the same two agents. We show that the addition of symmetry-breaking techniques can lead to an exponential reduction in the size of the search space of CBS, a popular framework for MAPF, and report significant improvements in both runtime and success rate versus CBSH and EPEA* – two recent and state-of-the-art MAPF algorithms.


#12 Multi-Agent Discussion Mechanism for Natural Language Generation [PDF] [Copy] [Kimi] [REL]

Authors: Xu Li, Mingming Sun, Ping Li

We introduce the discussion mechanism into the multiagent communicating encoder-decoder architecture for Natural Language Generation (NLG) tasks and prove that by applying the discussion mechanism, the communication between agents becomes more effective. Generally speaking, an encoder-decoder architecture predicts target-sequence word by word in several time steps. At each time step of prediction, agents with the discussion mechanism predict the target word after several discussion steps. In the first step of discussion, agents make their choice independently and express their decision to other agents. In the next discussion step, agents collect other agents’ decision to update their own decisions, then express the updated decisions to others again. After several iterations, the agents make their final decision based on a well-communicated situation. The benefit of the discussion mechanism is that multiple encoders can be designed as different structures to fit the specified input or to fetch different representations of inputs.We train and evaluate the discussion mechanism on Table to Text Generation, Text Summarization and Image Caption tasks, respectively. Our empirical results demonstrate that the proposed multi-agent discussion mechanism is helpful for maximizing the utility of the communication between agents.


#13 Large Scale Learning of Agent Rationality in Two-Player Zero-Sum Games [PDF] [Copy] [Kimi] [REL]

Authors: Chun Kai Ling, Fei Fang, J. Zico Kolter

With the recent advances in solving large, zero-sum extensive form games, there is a growing interest in the inverse problem of inferring underlying game parameters given only access to agent actions. Although a recent work provides a powerful differentiable end-to-end learning frameworks which embed a game solver within a deep-learning framework, allowing unknown game parameters to be learned via backpropagation, this framework faces significant limitations when applied to boundedly rational human agents and large scale problems, leading to poor practicality. In this paper, we address these limitations and propose a framework that is applicable for more practical settings. First, seeking to learn the rationality of human agents in complex two-player zero-sum games, we draw upon well-known ideas in decision theory to obtain a concise and interpretable agent behavior model, and derive solvers and gradients for end-to-end learning. Second, to scale up to large, real-world scenarios, we propose an efficient first-order primal-dual method which exploits the structure of extensive-form games, yielding significantly faster computation for both game solving and gradient computation. When tested on randomly generated games, we report speedups of orders of magnitude over previous approaches. We also demonstrate the effectiveness of our model on both real-world one-player settings and synthetic data.


#14 Leveraging Observations in Bandits: Between Risks and Benefits [PDF] [Copy] [Kimi] [REL]

Authors: Andrei Lupu, Audrey Durand, Doina Precup

Imitation learning has been widely used to speed up learning in novice agents, by allowing them to leverage existing data from experts. Allowing an agent to be influenced by external observations can benefit to the learning process, but it also puts the agent at risk of following sub-optimal behaviours. In this paper, we study this problem in the context of bandits. More specifically, we consider that an agent (learner) is interacting with a bandit-style decision task, but can also observe a target policy interacting with the same environment. The learner observes only the target’s actions, not the rewards obtained. We introduce a new bandit optimism modifier that uses conditional optimism contingent on the actions of the target in order to guide the agent’s exploration. We analyze the effect of this modification on the well-known Upper Confidence Bound algorithm by proving that it preserves a regret upper-bound of order O(lnT), even in the presence of a very poor target, and we derive the dependency of the expected regret on the general target policy. We provide empirical results showing both great benefits as well as certain limitations inherent to observational learning in the multi-armed bandit setting. Experiments are conducted using targets satisfying theoretical assumptions with high probability, thus narrowing the gap between theory and application.


#15 TrafficPredict: Trajectory Prediction for Heterogeneous Traffic-Agents [PDF] [Copy] [Kimi] [REL]

Authors: Yuexin Ma, Xinge Zhu, Sibo Zhang, Ruigang Yang, Wenping Wang, Dinesh Manocha

To safely and efficiently navigate in complex urban traffic, autonomous vehicles must make responsible predictions in relation to surrounding traffic-agents (vehicles, bicycles, pedestrians, etc.). A challenging and critical task is to explore the movement patterns of different traffic-agents and predict their future trajectories accurately to help the autonomous vehicle make reasonable navigation decision. To solve this problem, we propose a long short-term memory-based (LSTM-based) realtime traffic prediction algorithm, TrafficPredict. Our approach uses an instance layer to learn instances’ movements and interactions and has a category layer to learn the similarities of instances belonging to the same type to refine the prediction. In order to evaluate its performance, we collected trajectory datasets in a large city consisting of varying conditions and traffic densities. The dataset includes many challenging scenarios where vehicles, bicycles, and pedestrians move among one another. We evaluate the performance of TrafficPredict on our new dataset and highlight its higher accuracy for trajectory prediction by comparing with prior prediction methods.


#16 Learning to Teach in Cooperative Multiagent Reinforcement Learning [PDF] [Copy] [Kimi] [REL]

Authors: Shayegan Omidshafiei, Dong-Ki Kim, Miao Liu, Gerald Tesauro, Matthew Riemer, Christopher Amato, Murray Campbell, Jonathan P. How

Collective human knowledge has clearly benefited from the fact that innovations by individuals are taught to others through communication. Similar to human social groups, agents in distributed learning systems would likely benefit from communication to share knowledge and teach skills. The problem of teaching to improve agent learning has been investigated by prior works, but these approaches make assumptions that prevent application of teaching to general multiagent problems, or require domain expertise for problems they can apply to. This learning to teach problem has inherent complexities related to measuring long-term impacts of teaching that compound the standard multiagent coordination challenges. In contrast to existing works, this paper presents the first general framework and algorithm for intelligent agents to learn to teach in a multiagent environment. Our algorithm, Learning to Coordinate and Teach Reinforcement (LeCTR), addresses peer-to-peer teaching in cooperative multiagent reinforcement learning. Each agent in our approach learns both when and what to advise, then uses the received advice to improve local learning. Importantly, these roles are not fixed; these agents learn to assume the role of student and/or teacher at the appropriate moments, requesting and providing advice in order to improve teamwide performance and learning. Empirical comparisons against state-of-the-art teaching methods show that our teaching agents not only learn significantly faster, but also learn to coordinate in tasks where existing methods fail.


#17 Overcoming Blind Spots in the Real World: Leveraging Complementary Abilities for Joint Execution [PDF] [Copy] [Kimi] [REL]

Authors: Ramya Ramakrishnan, Ece Kamar, Besmira Nushi, Debadeepta Dey, Julie Shah, Eric Horvitz

Simulators are being increasingly used to train agents before deploying them in real-world environments. While training in simulation provides a cost-effective way to learn, poorly modeled aspects of the simulator can lead to costly mistakes, or blind spots. While humans can help guide an agent towards identifying these error regions, humans themselves have blind spots and noise in execution. We study how learning about blind spots of both can be used to manage hand-off decisions when humans and agents jointly act in the real-world in which neither of them are trained or evaluated fully. The formulation assumes that agent blind spots result from representational limitations in the simulation world, which leads the agent to ignore important features that are relevant for acting in the open world. Our approach for blind spot discovery combines experiences collected in simulation with limited human demonstrations. The first step applies imitation learning to demonstration data to identify important features that the human is using but that the agent is missing. The second step uses noisy labels extracted from action mismatches between the agent and the human across simulation and demonstration data to train blind spot models. We show through experiments on two domains that our approach is able to learn a succinct representation that accurately captures blind spot regions and avoids dangerous errors in the real world through transfer of control between the agent and the human.


#18 Evolution of Collective Fairness in Hybrid Populations of Humans and Agents [PDF] [Copy] [Kimi] [REL]

Authors: Fernando P. Santos, Jorge M. Pacheco, Ana Paiva, Francisco C. Santos

Fairness plays a fundamental role in decision-making, which is evidenced by the high incidence of human behaviors that result in egalitarian outcomes. This is often shown in the context of dyadic interactions, resorting to the Ultimatum Game. The peculiarities of group interactions – and the corresponding effect in eliciting fair actions – remain, however, astray. Focusing on groups suggests several questions related with the effect of group size, group decision rules and the interrelation of human and agents’ behaviors in hybrid groups. To address these topics, here we test a Multiplayer version of the Ultimatum Game (MUG): proposals are made to groups of Responders that, collectively, accept or reject them. Firstly, we run an online experiment to evaluate how humans react to different group decision rules. We observe that people become increasingly fair if groups adopt stricter decision rules, i.e., if more individuals are required to accept a proposal for it to be accepted by the group. Secondly, we propose a new analytical model to shed light on how such behaviors may have evolved. Thirdly, we adapt our model to include agents with fixed behaviors. We show that including hardcoded Pro-social agents favors the evolutionary stability of fair states, even for soft group decision rules. This suggests that judiciously introducing agents with particular behaviors in a population may leverage long-term social benefits.


#19 Multi-Winner Contests for Strategic Diffusion in Social Networks [PDF] [Copy] [Kimi] [REL]

Authors: Wen Shen, Yang Feng, Cristina V. Lopes

Strategic diffusion encourages participants to take active roles in promoting stakeholders’ agendas by rewarding successful referrals. As social media continues to transform the way people communicate, strategic diffusion has become a powerful tool for stakeholders to influence people’s decisions or behaviors for desired objectives. Existing reward mechanisms for strategic diffusion are usually either vulnerable to falsename attacks or not individually rational for participants that have made successful referrals. Here, we introduce a novel multi-winner contests (MWC) mechanism for strategic diffusion in social networks. The MWC mechanism satisfies several desirable properties, including false-name-proofness, individual rationality, budget constraint, monotonicity, and subgraph constraint. Numerical experiments on four real-world social network datasets demonstrate that stakeholders can significantly boost participants’ aggregated efforts with proper design of competitions. Our work sheds light on how to design manipulation-resistant mechanisms with appropriate contests.


#20 Theory of Minds: Understanding Behavior in Groups through Inverse Planning [PDF] [Copy] [Kimi] [REL]

Authors: Michael Shum, Max Kleiman-Weiner, Michael L. Littman, Joshua B. Tenenbaum

Human social behavior is structured by relationships. We form teams, groups, tribes, and alliances at all scales of human life. These structures guide multi-agent cooperation and competition, but when we observe others these underlying relationships are typically unobservable and hence must be inferred. Humans make these inferences intuitively and flexibly, often making rapid generalizations about the latent relationships that underlie behavior from just sparse and noisy observations. Rapid and accurate inferences are important for determining who to cooperate with, who to compete with, and how to cooperate in order to compete. Towards the goal of building machine-learning algorithms with human-like social intelligence, we develop a generative model of multiagent action understanding based on a novel representation for these latent relationships called Composable Team Hierarchies (CTH). This representation is grounded in the formalism of stochastic games and multi-agent reinforcement learning. We use CTH as a target for Bayesian inference yielding a new algorithm for understanding behavior in groups that can both infer hidden relationships as well as predict future actions for multiple agents interacting together. Our algorithm rapidly recovers an underlying causal model of how agents relate in spatial stochastic games from just a few observations. The patterns of inference made by this algorithm closely correspond with human judgments and the algorithm makes the same rapid generalizations that people do.


#21 Multiagent Decision Making For Maritime Traffic Management [PDF] [Copy] [Kimi] [REL]

Authors: Arambam James Singh, Duc Thien Nguyen, Akshat Kumar, Hoong Chuin Lau

We address the problem of maritime traffic management in busy waterways to increase the safety of navigation by reducing congestion. We model maritime traffic as a large multiagent systems with individual vessels as agents, and VTS authority as the regulatory agent. We develop a maritime traffic simulator based on historical traffic data that incorporates realistic domain constraints such as uncertain and asynchronous movement of vessels. We also develop a traffic coordination approach that provides speed recommendation to vessels in different zones. We exploit the nature of collective interactions among agents to develop a scalable policy gradient approach that can scale up to real world problems. Empirical results on synthetic and real world problems show that our approach can significantly reduce congestion while keeping the traffic throughput high.


#22 Probabilistic Alternating-Time <em>µ</em>-Calculus [PDF] [Copy] [Kimi] [REL]

Authors: Fu Song, Yedi Zhang, Taolue Chen, Yu Tang, Zhiwu Xu

Reasoning about strategic abilities is key to an AI system consisting of multiple agents with random behaviors. We propose a probabilistic extension of Alternating µ-Calculus (AMC), named PAMC, for reasoning about strategic abilities of agents in stochastic multi-agent systems. PAMC subsumes existing logics AMC and PµTL. The usefulness of PAMC is exemplified by applications in genetic regulatory networks. We show that, for PAMC, the model checking problem is in UP∩co-UP, and the satisfiability problem is EXPTIME-complete, both of which are the same as those for AMC. Moreover, PAMC admits the small model property. We implement the satisfiability checking procedure in a tool PAMCSolver.