Total: 18

Social punishment, whereby cooperators punish defectors, has been suggested as an important mechanism that promotes the emergence of cooperation or maintenance of social norms in the context of the one-shot (i.e. non-repeated) interaction. However, whenever antisocial punishment, whereby defectors punish cooperators, is available, this antisocial behavior outperforms social punishment, leading to the destruction of cooperation. In this paper, we use evolutionary game theory to show that this antisocial behavior can be efficiently restrained by relying on prior commitments, wherein agents can arrange, prior to an interaction, agreements regarding posterior compensation by those who dishonor the agreements. We show that, although the commitment mechanism by itself can guarantee a notable level of cooperation, a significantly higher level is achieved when both mechanisms, those of proposing prior commitments and of punishment, are available in co-presence. Interestingly, social punishment prevails and dominates in this system as it can take advantage of the commitment mechanism to cope with antisocial behaviors. That is, establishment of a commitment system helps to pave the way for the evolution of social punishment and abundant cooperation, even in the presence of antisocial punishment.

The current frameworks of reasoning about agents' collective strategy are either too conservative or too liberal in terms of the sharing of local information between agents. In this paper, we argue that in many cases, a suitable amount of information is required to be communicated between agents to both enforce goals and keep privacy. Several communication operators are proposed to work with an epistemic strategy logic ATLK. The complexity of model checking resulting logics is studied, and surprisingly, we found that the additional expressiveness from the communication operators comes for free.

Model checking probabilistic knowledge of memoryful semantics is undecidable, even for a simple formula concerning the reachability of probabilistic knowledge of a single agent. This result suggests that the usual approach of tackling undecidable model checking problems, by finding syntactic restrictions over the logic language, may not suffice. In this paper, we propose to work with an additional restriction that agent's knowledge concerns a special class of atomic propositions. A PSPACE-complete case is identified with this additional restriction, for a logic language combining LTL with limit-sure knowledge of a single agent.

Communication between agents has the potential to improve team performance of collaborative tasks. However, communication is not free in most domains, requiring agents to reason about the costs and benefits of sharing information. In this work, we develop an online, decentralized communication policy, ConTaCT, that enables agents to decide whether or not to communicate during time-critical collaborative tasks in unknown, deterministic environments. Our approach is motivated by real-world applications, including the coordination of disaster response and search and rescue teams. These settings motivate a model structure that explicitly represents the world model as initially unknown but deterministic in nature, and that de-emphasizes uncertainty about action outcomes. Simulated experiments are conducted in which ConTaCT is compared to other multi-agent communication policies, and results indicate that ConTaCT achieves comparable task performance while substantially reducing communication overhead.

Pushdown multi-agent systems, modeled by pushdown game structures (PGSs), are an important paradigm of infinite-state multi-agent systems. Alternating-time temporal logics are well-known specification formalisms for multi-agent systems, where the selective path quantifier is introduced to reason about strategies of agents. In this paper, we investigate model checking algorithms for variants of alternating-time temporal logics over PGSs, initiated by Murano and Perelli at IJCAI'15. We first give a triply exponential-time model checking algorithm for ATL* over PGSs. The algorithm is based on the saturation method, and is the first global model checking algorithm with a matching lower bound. Next, we study the model checking problem for the alternating-time mu-calculus. We propose an exponential-time global model checking algorithm which extends similar algorithms for pushdown systems and modal mu-calculus. The algorithm admits a matching lower bound, which holds even for the alternation-free fragment and ATL.

In trust systems, unfair rating attacks — where advisors provide ratings dishonestly — influence the accuracy of trust evaluation. A secure trust system should function properly under all possible unfair rating attacks; including dynamic attacks. In the literature, camouflage attacks are the most studied dynamic attacks. But an open question is whether more harmful dynamic attacks exist. We propose random processes to model and measure dynamic attacks. The harm of an attack is influenced by a user's ability to learn from the past. We consider three types of users: blind users, aware users, and general users. We found for all the three types, camouflage attacks are far from the most harmful. We identified the most harmful attacks, under which we found the ratings may still be useful to users.

This paper introduces an extension of the target surveillance problem in which the surveillance agent is exposed to an adversarial ballistic threat. The problem is formulated as a mixed observability Markov decision process (MOMDP), which is a factored variant of the partially observable Markov decision process, to account for state and dynamic uncertainties. The control policy resulting from solving the MOMDP aims to optimize the frequency of target observations and minimize exposure to the ballistic threat. The adversary’s behavior is modeled with a level-k policy, which is used to construct the state transition of the MOMDP. The approach is empirically evaluated against a MOMDP adversary and against a human opponent in a target surveillance computer game. The empirical results demonstrate that, on average, level 3 MOMDP policies outperform lower level reasoning policies as well as human players.

The application of DCOP models to large problems faces two main limitations: (i) Modeling limitations, as each agent can handle only a single variable of the problem; and (ii) Resolution limitations, as current approaches do not exploit the local problem structure withineach agent. This paper proposes a novel Multi-Variable Agent (MVA) DCOP decompositiontechnique, which: (i) Exploits the co-locality of each agent's variables, allowing us to adopt efficient centralized techniques within each agent; (ii) Enables the use of hierarchical parallel models and proposes the use of GPUs; and (iii) Reduces the amount of computation and communication required in several classes of DCOP algorithms.

Bribery in elections is an important problem in computational social choice theory. We introduce and study two important special cases of the bribery problem, namely, FRUGAL-BRIBERY and FRUGAL-$BRIBERY where the briber is frugal in nature. By this, we mean that the briber is only able to influence voters who benefit from the suggestion of the briber. More formally, a voter is vulnerable if the outcome of the election improves according to her own preference when she accepts the suggestion of the briber. In the FRUGAL-BRIBERY problem, the goal is to make a certain candidate win the election by changing only the vulnerable votes. In the FRUGAL-$BRIBERY problem, the vulnerable votes have prices and the goal is to make a certain candidate win the election by changing only the vulnerable votes, subject to a budget constraint. We show that both the FRUGAL-BRIBERY and the FRUGAL-$BRIBERY problems are intractable for many commonly used voting rules for weighted as well as unweighted elections. These intractability results demonstrate that bribery is a hard computational problem, in the sense that several special cases of this problem continue to be computationally intractable. This strengthens the view that bribery, although a possible attack on an election in principle, may be infeasible in practice.

A key advantage the reactive planning approach adopted by BDI-based agents is the ability to recover from plan execution failures, and almost all BDI agent programming languages and platforms provide some form of failure handling mechanism. In general, these consist of simply choosing an alternative plan for the failed subgoal (e.g., JACK, Jadex). In this paper, we propose an alternative approach to recovering from execution failures that relies on exploiting positive interactions between an agent's intentions. A positive interaction occurs when the execution of an action in one intention assists the execution of actions in other intentions (e.g., by (re)establishing their preconditions). We have implemented our approach in a scheduling algorithm for BDI agents which we call SP. The results of a preliminary empirical evaluation of SP suggest our approach out-performs existing failure handling mechanisms used by state-of-the-art BDI languages. Moreover, the computational overhead of SP is modest.

In agent-based simulation, emergent equilibrium describes the macroscopic steady states of agents' interactions. While the state of individual agents might be changing, the collective behavior pattern remains the same in macroscopic equilibrium states. Traditionally, these emergent equilibriums are calculated using Monte Carlo methods. However, these methods require thousands of repeated simulation runs, which are extremely time-consuming. In this paper, we propose a novel three-layer framework to efficiently compute emergent equilibriums. The framework consists of a macro-level pseudo-arclength equilibrium solver (PAES), a micro-level simulator (MLS) and a macro-micro bridge (MMB). It can adaptively explore parameter space and recursively compute equilibrium states using the predictor-corrector scheme. We apply the framework to the popular opinion dynamics and labour market models. The experimental results show that our framework outperformed Monte Carlo experiments in terms of computation efficiency while maintaining the accuracy.

In crowded multi-agent navigation environments, the motion of the agents is significantly constrained by the motion of the nearby agents. This makes planning paths very difficult and leads to inefficient global motion. To address this problem, we propose a new distributed approach to coordinate the motions of agents in crowded environments. With our approach, agents take into account the velocities and goals of their neighbors and optimize their motion accordingly and in real-time. We experimentally validate our coordination approach in a variety of scenarios and show that its performance scales to scenarios with hundreds of agents.

We study the (parameterized) complexity of Shift Bribery for multiwinner voting rules. We focus on the SNTV, Bloc, k-Borda, and Chamberlin-Courant rules, as well as on approximate variants of the Chamberlin-Courant rule, since the original rule is NP-hard to compute. We show that Shift Bribery tends to be significantly harder in the multiwinner setting than in the single-winner one by showing settings where Shift Bribery is easy in the single-winner cases, but is hard (and hard to approximate) in the multiwinner ones. We show that the non-monotonicity of those rules which are based on approximation algorithms for the Chamberlin--Courant rule sometimes affects the complexity of Shift Bribery.

Many solution methods for Markov Decision Processes (MDPs) exploit structure in the problem and are based on value function factorization. Especially multiagent settings, however, are known to suffer from an exponential increase in value component sizes as interactions become denser, restricting problem sizes and types that can be handled. We present an approach to mitigate this limitation for certain types of multiagent systems, exploiting a property that can be thought of as "anonymous influence" in the factored MDP. We show how representational benefits from anonymity translate into computational efficiencies, both for variable elimination in a factor graph and for the approximate linear programming solution to factored MDPs. Our methods scale to factored MDPs that were previously unsolvable, such as the control of a stochastic disease process over densely connected graphs with 50 nodes and 25 agents.

Plan monitoring in a collaborative multi-agent system requires an agent to not only monitor the execution of its own plan, but also to detect possible deviations or failures in the plan execution of its teammates. In domains featuring partial observability and uncertainty in the agents’ sensing and actuation, especially where communication among agents is sparse (as a part of a cost-minimized plan), plan monitoring can be a significant challenge. We design an Expectation Maximization (EM) based algorithm for detection of plan deviation of teammates in such a multi-agent system. However, a direct implementation of this algorithm is intractable, so we also design an alternative approach grounded on the agents’ plans, for tractability. We establish its equivalence to the intractable version, and evaluate these techniques in some challenging tasks.

Decentralized partially observable Markov decision processes (Dec-POMDPs) provide a general framework for multiagent sequential decision-making under uncertainty. Although Dec-POMDPs are typically intractable to solve for real-world problems, recent research on macro-actions (i.e., temporally-extended actions) has significantly increased the size of problems that can be solved. However, current methods assume the underlying Dec-POMDP model is known a priori or a full simulator is available during planning time. To accommodate more realistic scenarios, when such information is not available, this paper presents a policy-based reinforcement learning approach, which learns the agent policies based solely on trajectories generated by previous interaction with the environment (e.g., demonstrations). We show that our approach is able to generate valid macro-action controllers and develop an expectationmaximization (EM) algorithm (called Policy-based EM or PoEM), which has convergence guarantees for batch learning. Our experiments show PoEM is a scalable learning method that can learn optimal policies and improve upon hand-coded “expert” solutions.

We consider an autonomous agent operating in a stochastic, partially-observable, multiagent environment, that explicitly models the other agents as probabilistic deterministic finite-state controllers (PDFCs) in order to predict their actions. We assume that such models are not given to the agent, but instead must be learned from (possibly imperfect) observations of the other agents' behavior. The agent maintains a belief over the other agents' models, that is updated via Bayesian inference. To represent this belief we place a flexible stick-breaking distribution over PDFCs, that allows the posterior to concentrate around controllers whose size is not bounded and scales with the complexity of the observed data. Since this Bayesian inference task is not analytically tractable, we devise a Markov chain Monte Carlo algorithm to approximate the posterior distribution. The agent then embeds the result of this inference into its own decision making process using the interactive POMDP framework. We show that our learning algorithm can learn agent models that are behaviorally accurate for problems of varying complexity, and that the agent's performance increases as a result.

The decision to take vaccinations and other protective interventions for avoiding an infection is a natural game-theoretic setting. Most of the work on vaccination games has focused on decisions at the start of an epidemic. However, a lot of people defer their vaccination decisions, in practice. For example, in the case of the seasonal flu, vaccination rates gradually increase, as the epidemic rate increases. This motivates the study of temporal vaccination games, in which vaccination decisions can be made more than once. An important issue in the context of temporal decisions is that of resource limitations, which may arise due to production and distribution constraints. While there has been some work on temporal vaccination games, resource constraints have not been considered. In this paper, we study temporal vaccination games for epidemics in the SI (susceptible-infectious) model, with resource constraints in the form of a repeated game in complex social networks, with budgets on the number of vaccines that can be taken at any time. We find that the resource constraints and the vaccination and infection costs have a significant impact on the structure of Nash equilibria (NE). In general, the budget constraints can cause NE to become very inefficient, and finding efficient NE as well as the social optimum are NP-hard problems. We develop algorithms for finding NE and approximating the social optimum. We evaluate our results using simulations on different kinds of networks.