| Total: 38

Poverty and economic hardship are understood to be highly complex and dynamic phenomena. Due to the multi-faceted nature of welfare, assistance programs targeted at alleviating hardship can face challenges, as they often rely on simpler welfare measurements, such as income or wealth, that fail to capture to full complexity of each family's state. Here, we explore one important dimension – susceptibility to income shocks. We introduce a model of welfare that incorporates income, wealth, and income shocks and analyze this model to show that it can vary, at times substantially, from measures of welfare that only use income or wealth. We then study the algorithmic problem of optimally allocating subsidies in the presence of income shocks. We consider two well-studied objectives: the first aims to minimize the expected number of agents that fall below a given welfare threshold (a min-sum objective) and the second aims to minimize the likelihood that the most vulnerable agent falls below this threshold (a min-max objective). We present optimal and near-optimal algorithms for various general settings. We close with a discussion on future directions on allocating societal resources and ethical implications of related approaches.

It is often advantageous to be able to extract resource requirements in resource logics of strategic ability, rather than to verify whether a fixed resource requirement is sufficient for achieving a goal. We study Parameterised Resource-Bounded Alternating Time Temporal Logic where parameter extraction is possible. We give a parameter extraction algorithm and prove that the model-checking problem is 2EXPTIME-complete.

Social dilemmas have been widely studied to explain how humans are able to cooperate in society. Considerable effort has been invested in designing artificial agents for social dilemmas that incorporate explicit agent motivations that are chosen to favor coordinated or cooperative responses. The prevalence of this general approach points towards the importance of achieving an understanding of both an agent's internal design and external environment dynamics that facilitate cooperative behavior. In this paper, we investigate how partner selection can promote cooperative behavior between agents who are trained to maximize a purely selfish objective function. Our experiments reveal that agents trained with this dynamic learn a strategy that retaliates against defectors while promoting cooperation with other agents resulting in a prosocial society.

We investigate the possibility of an incentive-compatible (IC, a.k.a. strategy-proof) mechanism for the classification of agents in a network according to their reviews of each other. In the α-classification problem we are interested in selecting the top α fraction of users. We give upper bounds (impossibilities) and lower bounds (mechanisms) on the worst-case coincidence between the classification of an IC mechanism and the ideal α-classification. We prove bounds which depend on α and on the maximal number of reviews given by a single agent, Δ. Our results show that it is harder to find a good mechanism when α is smaller and Δ is larger. In particular, if Δ is unbounded, then the best mechanism is trivial (that is, it does not take into account the reviews). On the other hand, when Δ is sublinear in the number of agents, we give a simple, natural mechanism, with a coincidence ratio of α.

In this paper we describe a novel approach to team formation based on the value of inter-agent interactions. Specifically, we propose a model of teamwork that considers outcomes from chains of interactions between agents. Based on our model, we devise a number of network metrics to capture the contribution of interactions between agents. This is then used to learn the value of teamwork from historical team performance data. We apply our model to predict team performance and validate our approach using real-world team performance data from the 2018 FIFA World Cup. Our model is shown to better predict the real-world performance of teams by up to 46% compared to models that ignore inter-agent interactions.

We study the problem of verifying multi-agent systems under the assumption of bounded recall. We introduce the logic CTLKBR, a bounded-recall variant of the temporal-epistemic logic CTLK. We define and study the model checking problem against CTLK specifications under incomplete information and bounded recall and present complexity upper bounds. We present an extension of the BDD-based model checker MCMAS implementing model checking under bounded recall semantics and discuss the experimental results obtained.

Coalition Structure Generation (CSG) is an NP-complete problem that remains difficult to solve on account of its complexity. In this paper, we propose an efficient hybrid algorithm for optimal coalition structure generation called ODSS. ODSS is a hybrid version of two previously established algorithms IDP (Rahwan and Jennings 2008) and IP (Rahwan et al. 2009). ODSS minimizes the overlapping between IDP and IP by dividing the whole search space of CSG into two disjoint sets of subspaces and proposes a novel subspace shrinking technique to reduce the size of the subspace searched by IP with the help of IDP. When compared to the state-of-the-art against a wide variety of value distributions, ODSS is shown to perform better by up to 54.15% on benchmark inputs.

Search and inference are two main strategies for optimally solving Distributed Constraint Optimization Problems (DCOPs). Recently, several algorithms were proposed to combine their advantages. Unfortunately, such algorithms only use an approximated inference as a one-shot preprocessing phase to construct the initial lower bounds which lead to inefficient pruning under the limited memory budget. On the other hand, iterative inference algorithms (e.g., MB-DPOP) perform a context-based complete inference for all possible contexts but suffer from tremendous traffic overheads. In this paper, (i) hybridizing search with context-based inference, we propose a complete algorithm for DCOPs, named HS-CAI where the inference utilizes the contexts derived from the search process to establish tight lower bounds while the search uses such bounds for efficient pruning and thereby reduces contexts for the inference. Furthermore, (ii) we introduce a context evaluation mechanism to select the context patterns for the inference to further reduce the overheads incurred by iterative inferences. Finally, (iii) we prove the correctness of our algorithm and the experimental results demonstrate its superiority over the state-of-the-art.

In the ad hoc teamwork setting, a team of agents needs to perform a task without prior coordination. The most advanced approach learns policies based on previous experiences and reuses one of the policies to interact with new teammates. However, the selected policy in many cases is sub-optimal. Switching between policies to adapt to new teammates' behaviour takes time, which threatens the successful performance of a task. In this paper, we propose AATEAM – a method that uses the attention-based neural networks to cope with new teammates' behaviour in real-time. We train one attention network per teammate type. The attention networks learn both to extract the temporal correlations from the sequence of states (i.e. contexts) and the mapping from contexts to actions. Each attention network also learns to predict a future state given the current context and its output action. The prediction accuracies help to determine which actions the ad hoc agent should take. We perform extensive experiments to show the effectiveness of our method.

We analyse opinion diffusion in social networks, where a finite set of individuals is connected in a directed graph and each simultaneously changes their opinion to that of the majority of their influencers. We study the algorithmic properties of the fixed-point behaviour of such networks, showing that the problem of establishing whether individuals converge to stable opinions is PSPACE-complete.

Distributed Constraint Optimization Problems (DCOPs) are a widely studied constraint handling framework. The objective of a DCOP algorithm is to optimize a global objective function that can be described as the aggregation of several distributed constraint cost functions. In a DCOP, each of these functions is defined by a set of discrete variables. However, in many applications, such as target tracking or sleep scheduling in sensor networks, continuous valued variables are more suited than the discrete ones. Considering this, Functional DCOPs (F-DCOPs) have been proposed that can explicitly model a problem containing continuous variables. Nevertheless, state-of-the-art F-DCOPs approaches experience onerous memory or computation overhead. To address this issue, we propose a new F-DCOP algorithm, namely Particle Swarm based F-DCOP (PFD), which is inspired by a meta-heuristic, Particle Swarm Optimization (PSO). Although it has been successfully applied to many continuous optimization problems, the potential of PSO has not been utilized in F-DCOPs. To be exact, PFD devises a distributed method of solution construction while significantly reducing the computation and memory requirements. Moreover, we theoretically prove that PFD is an anytime algorithm. Finally, our empirical results indicate that PFD outperforms the state-of-the-art approaches in terms of solution quality and computation overhead.

Agent programming languages have proved useful for formally modelling implemented systems such as PRS and JACK, and for reasoning about their behaviour. Over the past decades, many agent programming languages and extensions have been developed. A key feature in some of them is their support for the specification of ‘concurrent’ actions and programs. However, their notion of concurrency is still limited, as it amounts to a nondeterministic choice between (sequential) action interleavings. Thus, the notion does not represent ‘true concurrency’, which can more naturally exploit multi-core computers and multi-robot manufacturing cells. This paper provides a true concurrency operational semantics for a BDI agent programming language, allowing actions to overlap in execution. We prove key properties of the semantics, relating to true concurrency and to its link with interleaving.

In open agent systems, the set of agents that are cooperating or competing changes over time and in ways that are nontrivial to predict. For example, if collaborative robots were tasked with fighting wildfires, they may run out of suppressants and be temporarily unavailable to assist their peers. We consider the problem of planning in these contexts with the additional challenges that the agents are unable to communicate with each other and that there are many of them. Because an agent's optimal action depends on the actions of others, each agent must not only predict the actions of its peers, but, before that, reason whether they are even present to perform an action. Addressing openness thus requires agents to model each other's presence, which becomes computationally intractable with high numbers of agents. We present a novel, principled, and scalable method in this context that enables an agent to reason about others' presence in its shared environment and their actions. Our method extrapolates models of a few peers to the overall behavior of the many-agent system, and combines it with a generalization of Monte Carlo tree search to perform individual agent reasoning in many-agent open environments. Theoretical analyses establish the number of agents to model in order to achieve acceptable worst case bounds on extrapolation error, as well as regret bounds on the agent's utility from modeling only some neighbors. Simulations of multiagent wildfire suppression problems demonstrate our approach's efficacy compared with alternative baselines.

We consider the classical problem of allocating resources among agents in an envy-free (and, where applicable, proportional) way. Recently, the basic model was enriched by introducing the concept of a social network which allows to capture situations where agents might not have full information about the allocation of all resources. We initiate the study of the parameterized complexity of these resource allocation problems by considering natural parameters which capture structural properties of the network and similarities between agents and items. In particular, we show that even very general fragments of the considered problems become tractable as long as the social network has bounded treewidth or bounded clique-width. We complement our results with matching lower bounds which show that our algorithms cannot be substantially improved.

Learning by experience in Multi-Agent Systems (MAS) is a difficult and exciting task, due to the lack of stationarity of the environment, whose dynamics evolves as the population learns. In order to design scalable algorithms for systems with a large population of interacting agents (e.g., swarms), this paper focuses on Mean Field MAS, where the number of agents is asymptotically infinite. Recently, a very active burgeoning field studies the effects of diverse reinforcement learning algorithms for agents with no prior information on a stationary Mean Field Game (MFG) and learn their policy through repeated experience. We adopt a high perspective on this problem and analyze in full generality the convergence of a fictitious iterative scheme using any single agent learning algorithm at each step. We quantify the quality of the computed approximate Nash equilibrium, in terms of the accumulated errors arising at each learning iteration step. Notably, we show for the first time convergence of model free learning algorithms towards non-stationary MFG equilibria, relying only on classical assumptions on the MFG dynamics. We illustrate our theoretical results with a numerical experiment in a continuous action-space environment, where the approximate best response of the iterative fictitious play scheme is computed with a deep RL algorithm.

Epistemic planning can be used to achieve implicit coordination in cooperative multi-agent settings where knowledge and capabilities are distributed between the agents. In these scenarios, agents plan and act on their own without having to agree on a common plan or protocol beforehand. However, epistemic planning is undecidable in general. In this paper, we show how implicit coordination can be achieved in a simpler, propositional setting by using nondeterminism as a means to allow the agents to take the other agents' perspectives. We identify a decidable fragment of epistemic planning that allows for arbitrary initial state uncertainty and non-determinism, but where actions can never increase the uncertainty of the agents. We show that in this fragment, planning for implicit coordination can be reduced to a version of fully observable nondeterministic (FOND) planning and that it thus has the same computational complexity as FOND planning. We provide a small case study, modeling the problem of multi-agent path finding with destination uncertainty in FOND, to show that our approach can be successfully applied in practice.

This work focuses on multi-agent reinforcement learning (RL) with inter-agent communication, in which communication is differentiable and optimized through backpropagation. Such differentiable approaches tend to converge more quickly to higher-quality policies compared to techniques that treat communication as actions in a traditional RL framework. However, modern communication networks (e.g., Wi-Fi or Bluetooth) rely on discrete communication channels, for which existing differentiable approaches that consider real-valued messages cannot be directly applied, or require biased gradient estimators. Some works have overcome this problem by treating the message space as an extension of the action space, and use standard RL to optimize message selection, but these methods tend to converge slower and to inferior policies. In this paper, we propose a stochastic message encoding/decoding procedure that makes a discrete communication channel mathematically equivalent to an analog channel with additive noise, through which gradients can be backpropagated. Additionally, we introduce an encryption step for use in noisy channels that forces channel noise to be message-independent, allowing us to compute unbiased derivative estimates even in the presence of unknown channel noise. To the best of our knowledge, this work presents the first differentiable communication learning approach that can compute unbiased derivatives through channels with unknown noise. We demonstrate the effectiveness of our approach in two example multi-robot tasks: a path finding and a collaborative search problem. There, we show that our approach achieves learning speed and performance similar to differentiable communication learning with real-valued messages (i.e., unlimited communication bandwidth), while naturally handling more realistic real-world communication constraints. Content Areas: Multi-Agent Communication, Reinforcement Learning.

We develop a Distributed Event-Triggered Stochastic GRAdient Descent (DETSGRAD) algorithm for solving non-convex optimization problems typically encountered in distributed deep learning. We propose a novel communication triggering mechanism that would allow the networked agents to update their model parameters aperiodically and provide sufficient conditions on the algorithm step-sizes that guarantee the asymptotic mean-square convergence. The algorithm is applied to a distributed supervised-learning problem, in which a set of networked agents collaboratively train their individual neural networks to perform image classification, while aperiodically sharing the model parameters with their one-hop neighbors. Results indicate that all agents report similar performance that is also comparable to the performance of a centrally trained neural network, while the event-triggered communication provides significant reduction in inter-agent communication. Results also show that the proposed algorithm allows the individual agents to classify the images even though the training data corresponding to all the classes are not locally available to each agent.

Many emerging AI applications request distributed machine learning (ML) among edge systems (e.g., IoT devices and PCs at the edge of the Internet), where data cannot be uploaded to a central venue for model training, due to their large volumes and/or security/privacy concerns. Edge devices are intrinsically heterogeneous in computing capacity, posing significant challenges to parameter synchronization for parallel training with the parameter server (PS) architecture. This paper proposes ADSP, a parameter synchronization model for distributed machine learning (ML) with heterogeneous edge systems. Eliminating the significant waiting time occurring with existing parameter synchronization models, the core idea of ADSP is to let faster edge devices continue training, while committing their model updates at strategically decided intervals. We design algorithms that decide time points for each worker to commit its model update, and ensure not only global model convergence but also faster convergence. Our testbed implementation and experiments show that ADSP outperforms existing parameter synchronization models significantly in terms of ML model convergence time, scalability and adaptability to large heterogeneity.

Recent superhuman results in games have largely been achieved in a variety of zero-sum settings, such as Go and Poker, in which agents need to compete against others. However, just like humans, real-world AI systems have to coordinate and communicate with other agents in cooperative partially observable environments as well. These settings commonly require participants to both interpret the actions of others and to act in a way that is informative when being interpreted. Those abilities are typically summarized as theory of mind and are seen as crucial for social interactions. In this paper we propose two different search techniques that can be applied to improve an arbitrary agreed-upon policy in a cooperative partially observable game. The first one, single-agent search, effectively converts the problem into a single agent setting by making all but one of the agents play according to the agreed-upon policy. In contrast, in multi-agent search all agents carry out the same common-knowledge search procedure whenever doing so is computationally feasible, and fall back to playing according to the agreed-upon policy otherwise. We prove that these search procedures are theoretically guaranteed to at least maintain the original performance of the agreed-upon policy (up to a bounded approximation error). In the benchmark challenge problem of Hanabi, our search technique greatly improves the performance of every agent we tested and when applied to a policy trained using RL achieves a new state-of-the-art score of 24.61 / 25 in the game, compared to a previous-best of 24.08 / 25.

Understanding and modeling behavior of multi-agent systems is a central step for artificial intelligence. Here we present a deep generative model which captures behavior generating process of multi-agent systems, supports accurate predictions and inference, infers how agents interact in a complex system, as well as identifies agent groups and interaction types. Built upon advances in deep generative models and a novel attention mechanism, our model can learn interactions in highly heterogeneous systems with linear complexity in the number of agents. We apply this model to three multi-agent systems in different domains and evaluate performance on a diverse set of tasks including behavior prediction, interaction analysis and system identification. Experimental results demonstrate its ability to model multi-agent systems, yielding improved performance over competitive baselines. We also show the model can successfully identify agent groups and interaction types in these systems. Our model offers new opportunities to predict complex multi-agent behaviors and takes a step forward in understanding interactions in multi-agent systems.

Coordinating multiple interacting agents to achieve a common goal is a difficult task with huge applicability. This problem remains hard to solve, even when limiting interactions to be mediated via a static interaction-graph. We present a novel approximate solution method for multi-agent Markov decision problems on graphs, based on variational perturbation theory. We adopt the strategy of planning via inference, which has been explored in various prior works. We employ a non-trivial extension of a novel high-order variational method that allows for approximate inference in large networks and has been shown to surpass the accuracy of existing variational methods. To compare our method to two state-of-the-art methods for multi-agent planning on graphs, we apply the method different standard GMDP problems. We show that in cases, where the goal is encoded as a non-local cost function, our method performs well, while state-of-the-art methods approach the performance of random guess. In a final experiment, we demonstrate that our method brings significant improvement for synchronization tasks.

In large-scale multi-agent systems, the large number of agents and complex game relationship cause great difficulty for policy learning. Therefore, simplifying the learning process is an important research issue. In many multi-agent systems, the interactions between agents often happen locally, which means that agents neither need to coordinate with all other agents nor need to coordinate with others all the time. Traditional methods attempt to use pre-defined rules to capture the interaction relationship between agents. However, the methods cannot be directly used in a large-scale environment due to the difficulty of transforming the complex interactions between agents into rules. In this paper, we model the relationship between agents by a complete graph and propose a novel game abstraction mechanism based on two-stage attention network (G2ANet), which can indicate whether there is an interaction between two agents and the importance of the interaction. We integrate this detection mechanism into graph neural network-based multi-agent reinforcement learning for conducting game abstraction and propose two novel learning algorithms GA-Comm and GA-AC. We conduct experiments in Traffic Junction and Predator-Prey. The results indicate that the proposed methods can simplify the learning process and meanwhile get better asymptotic performance compared with state-of-the-art algorithms.

Social psychology and real experiences show that cognitive consistency plays an important role to keep human society in order: if people have a more consistent cognition about their environments, they are more likely to achieve better cooperation. Meanwhile, only cognitive consistency within a neighborhood matters because humans only interact directly with their neighbors. Inspired by these observations, we take the first step to introduce neighborhood cognitive consistency (NCC) into multi-agent reinforcement learning (MARL). Our NCC design is quite general and can be easily combined with existing MARL methods. As examples, we propose neighborhood cognition consistent deep Q-learning and Actor-Critic to facilitate large-scale multi-agent cooperations. Extensive experiments on several challenging tasks (i.e., packet routing, wifi configuration and Google football player control) justify the superior performance of our methods compared with state-of-the-art MARL approaches.

We consider the challenging problem of online planning for a team of agents to autonomously search and track a time-varying number of mobile objects under the practical constraint of detection range limited onboard sensors. A standard POMDP with a value function that either encourages discovery or accurate tracking of mobile objects is inadequate to simultaneously meet the conflicting goals of searching for undiscovered mobile objects whilst keeping track of discovered objects. The planning problem is further complicated by misdetections or false detections of objects caused by range limited sensors and noise inherent to sensor measurements. We formulate a novel multi-objective POMDP based on information theoretic criteria, and an online multi-object tracking filter for the problem. Since controlling multi-agent is a well known combinatorial optimization problem, assigning control actions to agents necessitates a greedy algorithm. We prove that our proposed multi-objective value function is a monotone submodular set function; consequently, the greedy algorithm can achieve a (1-1/e) approximation for maximizing the submodular multi-objective function.