| Total: 21

In the Multi-Agent Meeting problem (MAM), the task is to find a meeting location for multiple agents, as well as a path for each agent to that location. In this paper, we introduce MM*, a Multi-Directional Heuristic Search algorithm that finds the optimal meeting location under different cost functions. MM* generalizes the Meet in the Middle (MM) bidirectional search algorithm to the case of finding an optimal meeting location for multiple agents. Several admissible heuristics are proposed, and experiments demonstrate the benefits of MM*.

The formal synthesis of automated or autonomous agents has elicited strong interest from the artificial intelligence community in recent years. This problem space broadly entails the derivation of decision-making policies for agents acting in an environment such that a formal specification of behavior is satisfied. Popular formalisms for such specifications include the quintessential Linear Temporal Logic (LTL) and Computation Tree Logic (CTL) which reason over infinite sequences and trees, respectively, of states. However, the related and relevant problem of reasoning over the frequency with which states are visited infinitely and enforcing behavioral specifications on the same has received little attention. That problem, known as Steady-State Policy Synthesis (SSPS) or steady-state control, is the focus of this paper. Prior related work has been mostly confined to unichain Markov Decision Processes (MDPs), while a tractable solution to the general multichain setting heretofore remains elusive. In this paper, we provide a solution to the latter within the context of multichain MDPs over a class of policies that account for all possible transitions in the given MDP. The solution policy is derived from a novel linear program (LP) that encodes constraints on the limiting distributions of the Markov chain induced by said policy. We establish a one-to-one correspondence between the feasible solutions of the LP and the stationary distributions of the induced Markov chains. The derived policy is shown to maximize the reward among the constrained class of stationary policies and to satisfy the specification constraints even when it does not exercise all possible transitions.

In HTN planning, the hierarchy has a wide impact on solutions. First, there is (usually) no state-based goal given, the objective is given via the hierarchy. Second, it enforces actions to be in a plan. Third, planners are not allowed to add actions apart from those introduced via decomposition, i.e. via the hierarchy. However, no heuristic considers the interplay of hierarchy and actions in the plan exactly (without relaxation) because this makes heuristic calculation NP-hard even under delete relaxation. We introduce the problem class of delete- and ordering-free HTN planning as basis for novel HTN heuristics and show that its plan existence problem is still NP-complete. We then introduce heuristics based on the new class using an integer programming model to solve it.

Conflict-Based Search (CBS) is a leading algorithm for optimal Multi-Agent Path Finding (MAPF). CBS variants typically compute MAPF solutions using some form of A* search. However, they often do so under strict time limits so as to avoid exhausting the available memory. In this paper, we present IDCBS, an iterative-deepening variant of CBS which can be executed without exhausting the memory and without strict time limits. IDCBS can be substantially faster than CBS due to incremental methods that it uses when processing CBS nodes.

Justifying a plan to a user requires answering questions about the space of possible plans. Recent work introduced a framework for doing so via plan-property dependencies, where plan properties p are Boolean functions on plans, and p entails q if all plans that satisfy p also satisfy q. We extend this work in two ways. First, we introduce new algorithms for computing plan-property dependencies, leveraging symbolic search and devising pruning methods for this purpose. Second, while the properties p were previously limited to goal facts and so-called action-set (AS) properties, here we extend them to LTL. Our new algorithms vastly outperform the previous ones, and our methods for LTL cause little overhead on AS properties.

Although symbolic bidirectional search is successful in optimal classical planning, state-of-the-art satisficing planners do not use bidirectional search. Previous bidirectional search planners for satisficing planning behaved similarly to a trivial portfolio, which independently executes forward and backward search without the desired ``meet-in-the-middle'' behavior of bidirectional search where the forward and backward search frontiers intersect at some point relatively far from the forward and backward start states. In this paper, we propose Top-to-Top Bidirectional Search (TTBS), a new bidirectional search strategy with front-to-front heuristic evaluation. We show that TTBS strongly exhibits ``meet-in-the-middle'' behavior and can solve instances solved by neither forward nor backward search on a number of domains.

Efficient and truthful mechanisms to price time on remote servers/machines have been the subject of much work in recent years due to the importance of the cloud market. This paper considers online revenue maximization for a unit capacity server, when jobs are non preemptive, in the Bayesian setting: at each time step, one job arrives, with parameters drawn from an underlying distribution. We design an efficiently computable truthful posted price mechanism, which maximizes revenue in expectation and in retrospect, up to additive error. The prices are posted prior to learning the agent's type, and the computed pricing scheme is deterministic. We also show the pricing mechanism is robust to learning the job distribution from samples, where polynomially many samples suffice to obtain near optimal prices.

We study the problem of policy synthesis for uncertain partially observable Markov decision processes (uPOMDPs). The transition probability function of uPOMDPs is only known to belong to a so-called uncertainty set, for instance in the form of probability intervals. Such a model arises when, for example, an agent operates under information limitation due to imperfect knowledge about the accuracy of its sensors. The goal is to compute a policy for the agent that is robust against all possible probability distributions within the uncertainty set. In particular, we are interested in a policy that robustly ensures the satisfaction of temporal logic and expected reward specifications. We state the underlying optimization problem as a semi-infinite quadratically-constrained quadratic program (QCQP), which has finitely many variables and infinitely many constraints. Since QCQPs are non-convex in general and practically infeasible to solve, we resort to the so-called convex-concave procedure to convexify the QCQP. Even though convex, the resulting optimization problem still has infinitely many constraints and is NP-hard. For uncertainty sets that form convex polytopes, we provide a transformation of the problem to a convex QCQP with finitely many constraints. We demonstrate the feasibility of our approach by means of several case studies that highlight typical bottlenecks for our problem. In particular, we show that we are able to solve benchmarks with hundreds of thousands of states, hundreds of different observations, and we investigate the effect of different levels of uncertainty in the models.

Recurrent neural networks (RNNs) have emerged as an effective representation of control policies in sequential decision-making problems. However, a major drawback in the application of RNN-based policies is the difficulty in providing formal guarantees on the satisfaction of behavioral specifications, e.g. safety and/or reachability. By integrating techniques from formal methods and machine learning, we propose an approach to automatically extract a finite-state controller (FSC) from an RNN, which, when composed with a finite-state system model, is amenable to existing formal verification tools. Specifically, we introduce an iterative modification to the so-called quantized bottleneck insertion technique to create an FSC as a randomized policy with memory. For the cases in which the resulting FSC fails to satisfy the specification, verification generates diagnostic information. We utilize this information to either adjust the amount of memory in the extracted FSC or perform focused retraining of the RNN. While generally applicable, we detail the resulting iterative procedure in the context of policy synthesis for partially observable Markov decision processes (POMDPs), which is known to be notoriously hard. The numerical experiments show that the proposed approach outperforms traditional POMDP synthesis methods by 3 orders of magnitude within 2% of optimal benchmark values.

We consider the problem of planning with arithmetic theories, and focus on generating optimal plans for numeric domains with constant and state-dependent action costs. Solving these problems efficiently requires a seamless integration between propositional and numeric reasoning. We propose a novel approach that leverages Optimization Modulo Theories (OMT) solvers to implement a domain-independent optimal theory-planner. We present a new encoding for optimal planning in this setting and we evaluate our approach using well-known, as well as new, numeric benchmarks.

Partially observable Markov decision processes (POMDPs) with continuous state and observation spaces have powerful flexibility for representing real-world decision and control problems but are notoriously difficult to solve. Recent online sampling-based algorithms that use observation likelihood weighting have shown unprecedented effectiveness in domains with continuous observation spaces. However there has been no formal theoretical justification for this technique. This work offers such a justification, proving that a simplified algorithm, partially observable weighted sparse sampling (POWSS), will estimate Q-values accurately with high probability and can be made to perform arbitrarily near the optimal solution by increasing computational power.

This work investigates the problem of optimising a partial-order plan’s (POP) flexibility through the simultaneous transformation of its action ordering and variable binding constraints. While the former has been extensively studied through the notions of deordering and reordering, the latter has received much less attention. We show that a plan’s variable bindings are often related to resource usage and their reinstantiation can yield more flexible plans. To do so, we extend existing POP optimality criteria to support variable reinstantiation, and prove that checking if a plan can be optimised further is NP-complete. We also propose a MaxSAT-based technique for increasing plan flexibility and provide a thorough experimental evaluation that suggests that there are benefits in action reinstantiation.

Cost partitioning is a method for admissibly combining admissible heuristics. In this work, we extend this concept to merge-and-shrink (M&S) abstractions that may use labels that do not directly correspond to operators. We investigate how optimal and saturated cost partitioning (SCP) interact with M&S transformations and develop a method to compute SCPs during the computation of M&S. Experiments show that SCP significantly improves M&S on standard planning benchmarks.

Propositional Dynamic Epistemic Logic (DEL) provides an expressive framework for epistemic planning, but lacks desirable features that are standard in first-order planning languages (such as problem-independent action representations via action schemas). A recent epistemic planning formalism based on First-Order Dynamic Epistemic Logic (FODEL) combines the strengths of DEL (higher-order epistemics) with those of first-order languages (lifted representation), yielding benefits in terms of expressiveness and representational succinctness. This paper studies the plan existence problem for FODEL planning, showing that while the problem is generally undecidable, the cases of single-agent planning and multi-agent planning with non-modal preconditions are decidable.

Most existing works in Probabilistic Simple Temporal Networks (PSTNs) base their frameworks on well-defined probability distributions. This paper addresses on PSTN Dynamic Controllability (DC) robustness measure, i.e. the execution success probability of a network under dynamic control. We consider PSTNs where the probability distributions of the contingent edges are ordinary distributed (e.g. non-parametric, non-symmetric). We introduce the concepts of dispatching protocol (DP) as well as DP-robustness, the probability of success under a predefined dynamic policy. We propose a fixed-parameter pseudo-polynomial time algorithm to compute the exact DP-robustness of any PSTN under NextFirst protocol, and apply to various PSTN datasets, including the real case of planetary exploration in the context of the Mars 2020 rover, and propose an original structural analysis.

If a planning agent is considering taking a bus, for example, the time that passes during its planning can affect the feasibility of its plans, as the bus may depart before the agent has found a complete plan. Previous work on this situated temporal planning setting proposed an abstract deliberation scheduling scheme for maximizing the probability of finding a plan that is still feasible at the time it is found. In this paper, we extend the deliberation scheduling approach to address problems in which plans can differ in their cost. Like the planning deadlines, these costs can be uncertain until a complete plan has been found. We show that finding a deliberation policy that minimizes expected cost is PSPACE-hard and that even for known costs and deadlines the optimal solution is a contingent, rather than sequential, schedule. We then analyze special cases of the problem and use these results to propose a greedy scheme that considers both the uncertain deadlines and costs. Our empirical evaluation shows that the greedy scheme performs well in practice on a variety of problems, including some generated from planner search trees.

Width-based planning algorithms have been demonstrated to be competitive with state-of-the-art heuristic search and SAT-based approaches, without requiring access to a model of action effects and preconditions, just access to a black-box simulator. Width-based planners search is guided by a measure of the novelty of states, that requires observations on simulator states to be given as a set of features. This paper proposes agnostic feature mapping mechanisms that define the features online, as exploration progresses and the domain of continuous state variables is revealed. We demonstrate the effectiveness of these features on the OpenAI gym "classical control" suite of benchmarks. We compare our online planners with state-of-the-art deep reinforcement learning algorithms, and show that width-based planners using our features can find policies of the same quality with significantly less computational resources.

A major difficulty of solving continuous POMDPs is to infer the multi-modal distribution of the unobserved true states and to make the planning algorithm dependent on the perceived uncertainty. We cast POMDP filtering and planning problems as two closely related Sequential Monte Carlo (SMC) processes, one over the real states and the other over the future optimal trajectories, and combine the merits of these two parts in a new model named the DualSMC network. In particular, we first introduce an adversarial particle filter that leverages the adversarial relationship between its internal components. Based on the filtering results, we then propose a planning algorithm that extends the previous SMC planning approach [Piche et al., 2018] to continuous POMDPs with an uncertainty-dependent policy. Crucially, not only can DualSMC handle complex observations such as image input but also it remains highly interpretable. It is shown to be effective in three continuous POMDP domains: the floor positioning domain, the 3D light-dark navigation domain, and a modified Reacher domain.

Several scientific studies have reported the existence of the income gap among rideshare drivers based on demographic factors such as gender, age, race, etc. In this paper, we study the income inequality among rideshare drivers due to discriminative cancellations from riders, and the tradeoff between the income inequality (called fairness objective) with the system efficiency (called profit objective). We proposed an online bipartite-matching model where riders are assumed to arrive sequentially following a distribution known in advance. The highlight of our model is the concept of acceptance rate between any pair of driver-rider types, where types are defined based on demographic factors. Specially, we assume each rider can accept or cancel the driver assigned to her, each occurs with a certain probability which reflects the acceptance degree from the rider type towards the driver type. We construct a bi-objective linear program as a valid benchmark and propose two LP-based parameterized online algorithms. Rigorous online competitive ratio analysis is offered to demonstrate the flexibility and efficiency of our online algorithms in balancing the two conflicting goals, promotions of fairness and profit. Experimental results on a real-world dataset are provided as well, which confirm our theoretical predictions.

With the popularity of the Internet, traditional offline resource allocation has evolved into a new form, called online resource allocation. It features the online arrivals of agents in the system and the real-time decision-making requirement upon the arrival of each online agent. Both offline and online resource allocation have wide applications in various real-world matching markets ranging from ridesharing to crowdsourcing. There are some emerging applications such as rebalancing in bike sharing and trip-vehicle dispatching in ridesharing, which involve a two-stage resource allocation process. The process consists of an offline phase and another sequential online phase, and both phases compete for the same set of resources. In this paper, we propose a unified model which incorporates both offline and online resource allocation into a single framework. Our model assumes non-uniform and known arrival distributions for online agents in the second online phase, which can be learned from historical data. We propose a parameterized linear programming (LP)-based algorithm, which is shown to be at most a constant factor of 1/4 from the optimal. Experimental results on the real dataset show that our LP-based approaches outperform the LP-agnostic heuristics in terms of robustness and effectiveness.

In full-knowledge multi-robot adversarial patrolling, a group of robots have to detect an adversary who knows the robots' strategy. The adversary can easily take advantage of any deterministic patrolling strategy, which necessitates the employment of a randomised strategy. While the Markov decision process has been the dominant methodology in computing the penetration detection probabilities, we apply enumerative combinatorics to characterise the penetration detection probabilities. It allows us to provide the closed formulae of these probabilities and facilitates characterising optimal random defence strategies. Comparing to iteratively updating the Markov transition matrices, our methods significantly reduces the time and space complexity of solving the problem. We use this method to tackle four penetration configurations.