Total: 29

We consider the problem of compositionally computing upper bounds on lengths of plans. Following existing work, our approach is based on a decomposition of state-variable dependency graphs (a.k.a. causal graphs). Tight bounds have been demonstrated previously for problems where key dependencies flow in a single direction—i.e. manipulating variable v1 can disturb the ability to manipulate v2 and not vice versa. We develop a more general bounding approach which allows us to compute useful bounds where dependency flows in both directions. Our approach is practically most useful when combined with earlier approaches, where the computed bounds are substantially improved in a relatively broad variety of problems. When combined with an existing planning procedure, the improved bounds yield coverage improvements for both solvable and unsolvable planning problems.

In autonomous exploration a mobile agent must adapt to new measurements to seek high reward, but disturbances cause a probability of collision that must be traded off against expected reward. This paper considers an autonomous agent tasked with maximizing measurements from a Gaussian Process while subject to unbounded disturbances. We seek an adaptive policy in which the maximum allowed probability of failure is constrained as a function of the expected reward. The policy is found using an extension to Monte Carlo Tree Search (MCTS) which bounds probability of failure. We apply MCTS to a sequence of approximating problems, which allows constraint satisfying actions to be found in an anytime manner. Our innovation lies in defining the approximating problems and replanning strategy such that the probability of failure constraint is guaranteed to be satisfied over the true policy. The approach does not need to plan for all measurements explicitly or constrain planning based only on the measurements that were observed. To the best of our knowledge, our approach is the first to enforce probability of failure constraints in adaptive sampling. Through experiments on real bathymetric data and simulated measurements, we show our algorithm allows an agent to take dangerous actions only when the reward justifies the risk. We then verify through Monte Carlo simulations that failure bounds are satisfied.

HTN planning provides an expressive formalism to model complex application domains. It has been widely used in realworld applications. However, the development of domainindependent planning techniques for such models is still lacking behind. The need to be informed about both statetransitions and the task hierarchy makes the realisation of search-based approaches difficult, especially with unrestricted partial ordering of tasks in HTN domains. Recently, a translation of HTN planning problems into propositional logic has shown promising empirical results. Such planners benefit from a unified representation of state and hierarchy, but until now require very large formulae to represent partial order. In this paper, we introduce a novel encoding of HTN Planning as SAT. In contrast to related work, most of the reasoning on ordering relations is not left to the SAT solver, but done beforehand. This results in much smaller formulae and, as shown in our evaluation, in a planner that outperforms previous SAT-based approaches as well as the state-of-the-art in search-based HTN planning.

Recent advances in applying deep learning to planning have shown that Deep Reactive Policies (DRPs) can be powerful for fast decision-making in complex environments. However, an important limitation of current DRP-based approaches is either the need of optimal planners to be used as ground truth in a supervised learning setting or the sample complexity of high-variance policy gradient estimators, which are particularly troublesome in continuous state-action domains. In order to overcome those limitations, we introduce a framework for training DRPs in continuous stochastic spaces via gradient-based policy search. The general approach is to explicitly encode a parametric policy as a deep neural network, and to formulate the probabilistic planning problem as an optimization task in a stochastic computation graph by exploiting the re-parameterization of the transition probability densities; the optimization is then solved by leveraging gradient descent algorithms that are able to handle non-convex objective functions. We benchmark our approach against stochastic planning domains exhibiting arbitrary differentiable nonlinear transition and cost functions (e.g., Reservoir Control, HVAC and Navigation). Results show that DRPs with more than 125,000 continuous action parameters can be optimized by our approach for problems with 30 state fluents and 30 action fluents on inexpensive hardware under 6 minutes. Also, we observed a speedup of 5 orders of magnitude in the average inference time per decision step of DRPs when compared to other state-of-the-art online gradient-based planners when the same level of solution quality is required.

To achieve practical execution, planners must produce temporal plans with some degree of run-time adaptability. Such plans can be expressed as Simple Temporal Networks (STN), that constrain the timing of action activations, and implicitly represent the space of choices for the plan executor. A first problem is to verify that all the executor choices allowed by the STN plan will be successful, i.e. the plan is valid. An even more important problem is to assess the effect of discrepancies between the model used for planning and the execution environment. We propose an approach to compute the “robustness envelope” (i.e., alternative action durations or resource consumption rates) of a given STN plan, for which the plan remains valid. Plans can have boolean and numeric variables as well as discrete and continuous change. We leverage Satisfiability Modulo Theories (SMT) to make the approach formal and practical.

Macro-operators, macros for short, are a well-known technique for enhancing performance of planning engines by providing “short-cuts” in the state space. Existing macro learning systems usually generate macros from most frequent sequences of actions in training plans. Such approach priorities frequently used sequences of actions over meaningful activities to be performed for solving planning tasks. This paper presents a technique that, inspired by resource locking in critical sections in parallel computing, learns macros capturing activities in which a limited resource (e.g., a robotic hand) is used. In particular, such macros capture the whole activity in which the resource is “locked” (e.g., the robotic hand is holding an object) and thus “bridge” states in which the resource is locked and cannot be used. We also introduce an “aggressive” variant of our technique that removes original operators superseded by macros from the domain model. Usefulness of macros is evaluated on several stateof-the-art planners, and a wide range of benchmarks from the learning tracks of the 2008 and 2011 editions of the International Planning Competition.

When performing temporal planning as forward state-space search, effective state memoisation is challenging. Whereas in classical planning, two states are equal if they have the same facts and variable values, in temporal planning this is not the case: as the plans that led to the two states are subject to temporal constraints, one might be extendable into at temporally valid plan, while the other might not. In this paper, we present an approach for reducing the state space explosion that arises due to having to keep many copies of the same ‘classically’ equal state – states that are classically equal are aggregated into metastates, and these are separated lazily only in the case of temporal inconsistency. Our evaluation shows that this approach, implemented in OPTIC and compared to existing state-of-the-art memoisation techniques, improves performance across a range of temporal domains.

In this paper we present techniques for reasoning natively with quantitative/qualitative interval constraints in statebased PDDL planners. While these are considered important in modeling and solving problems in timeline based planners; reasoning with these in PDDL planners has seen relatively little attention, yet is a crucial step towards making PDDL planners applicable in real-world scenarios, such as space missions. Our main contribution is to extend the planner OPTIC to reason natively with Allen interval constraints. We show that our approach outperforms both MTP, the only PDDL planner capable of handling similar constraints and a compilation to PDDL 2.1, by an order of magnitude. We go on to present initial results indicating that our approach is competitive with a timeline based planner on a Mars rover domain, showing the potential of PDDL planners in this setting.

Cloud computing has been widely adopted to support various computation services. A fundamental problem faced by cloud providers is how to efficiently allocate resources upon user requests and price the resource usage, in order to maximize resource efficiency and hence provider profit. Existing studies establish detailed performance models of cloud resource usage, and propose offline or online algorithms to decide allocation and pricing. Differently, we adopt a blackbox approach, and leverage model-free Deep Reinforcement Learning (DRL) to capture dynamics of cloud users and better characterize inherent connections between an optimal allocation/pricing policy and the states of the dynamic cloud system. The goal is to learn a policy that maximizes net profit of the cloud provider through trial and error, which is better than decisions made on explicit performance models. We combine long short-term memory (LSTM) units with fully-connected neural networks in our DRL to deal with online user arrivals, and adjust the output and update methods of basic DRL algorithms to address both resource allocation and pricing. Evaluation based on real-world datasets shows that our DRL approach outperforms basic DRL algorithms and state-of-theart white-box online cloud resource allocation/pricing algorithms significantly, in terms of both profit and the number of accepted users.

In real-time planning, the planner must select the next action within a fixed time bound. Because a complete plan may not have been found, the selected action might not lead to a goal and the agent may need to return to its current state. To preserve completeness, real-time search methods incorporate learning, in which heuristic values are updated. Previous work in real-time search has used table-based heuristics, in which the values of states are updated individually. In this paper, we explore the use of abstraction-based heuristics. By refining the abstraction on-line, we can update the values of multiple states, including ones the agent has not yet generated. We test this idea empirically using Cartesian abstractions in the Fast Downward planner. Results on various benchmarks, including the sliding tile puzzle and several IPC domains, indicate that the approach can improve performance compared to traditional heuristic updating. This work brings abstraction refinement, a powerful technique from offline planning, into the real-time setting.

Simplifying classical planning tasks by removing operators while preserving at least one optimal solution can significantly enhance the performance of planners. In this paper, we introduce the notion of operator mutex, which is a set of operators that cannot all be part of the same (strongly) optimal plan. We propose four different methods for inference of operator mutexes and experimentally verify that they can be found in a sizable number of planning tasks. We show how operator mutexes can be used in combination with structural symmetries to safely remove operators from the planning task.

In this work we present a novel approach to solving concurrent multiagent planning problems in which several agents act in parallel. Our approach relies on a compilation from concurrent multiagent planning to classical planning, allowing us to use an off-the-shelf classical planner to solve the original multiagent problem. The solution can be directly interpreted as a concurrent plan that satisfies a given set of concurrency constraints, while avoiding the exponential blowup associated with concurrent actions. Our planner is the first to handle action effects that are conditional on what other agents are doing. Theoretically, we show that the compilation is sound and complete. Empirically, we show that our compilation can solve challenging multiagent planning problems that require concurrent actions.

Current classical planners are very successful in finding (nonoptimal) plans, even for large planning instances. To do so, most planners rely on a preprocessing stage that computes a grounded representation of the task. Whenever the grounded task is too big to be generated (i.e., whenever this preprocess fails) the instance cannot even be tackled by the actual planner. To address this issue, we introduce a partial grounding approach that grounds only a projection of the task, when complete grounding is not feasible. We propose a guiding mechanism that, for a given domain, identifies the parts of a task that are relevant to find a plan by using off-the-shelf machine learning methods. Our empirical evaluation attests that the approach is capable of solving planning instances that are too big to be fully grounded.

We consider a class of generalized planning problems based on the idea of quantifying over sets of similar objects. We show how we can adapt fully observable nondeterministic planning techniques to produce generalized solutions that are easy to instantiate over particular problem instances. We also describe how we can reformulate a classical planning problem into a quantified one. The reformulation allows us to solve the original planning task without grounding every action with respect to all objects in the problem, and a single solution can be applied to a possibly infinite set of related classical planning tasks. We report experimental results that show our approach is a practical and promising technique for solving an interesting class of problems.

Red-black planning is a state-of-the-art approach to satisficing classical planning. Red-black planning heuristics are at the heart of the planner Mercury, the runner-up of a satisficing track in the International Planning Competition (IPC) 2014 and a major component of four additional planners in IPC 2018, including Saarplan, the runner-up in the agile track. Mercury’s exceptional performance is amplified by the fact that conditional effects were handled by the planner in a trivial way, simply by compiling them away. Conditional effects, however, are important for classical planning, and many domains require them for efficient modeling. Consequently, we investigate the possibility of handling conditional effects directly in the red-black planning heuristic function, extending the algorithm for computing red-black plans to the conditional effects setting. We show empirically that red-black planning heuristics that handle conditional effects natively outperform the variants that compile this feature away, improving coverage on tasks where black variables exist by 19%.

Multi-Agent Path Finding (MAPF) has been widely studied in the AI community. For example, Conflict-Based Search (CBS) is a state-of-the-art MAPF algorithm based on a twolevel tree-search. However, previous MAPF algorithms assume that an agent occupies only a single location at any given time, e.g., a single cell in a grid. This limits their applicability in many real-world domains that have geometric agents in lieu of point agents. Geometric agents are referred to as “large” agents because they can occupy multiple points at the same time. In this paper, we formalize and study LAMAPF, i.e., MAPF for large agents. We first show how CBS can be adapted to solve LA-MAPF. We then present a generalized version of CBS, called Multi-Constraint CBS (MCCBS), that adds multiple constraints (instead of one constraint) for an agent when it generates a high-level search node. We introduce three different approaches to choose such constraints as well as an approach to compute admissible heuristics for the high-level search. Experimental results show that all MC-CBS variants outperform CBS by up to three orders of magnitude in terms of runtime. The best variant also outperforms EPEA* (a state-of-the-art A*-based MAPF solver) in all cases and MDD-SAT (a state-of-the-art reduction-based MAPF solver) in some cases.

Research in classical planning so far was mainly concerned with generating a satisficing or an optimal plan. However, if such systems are used to make decisions that are relevant to humans, one should also consider the ethical consequences generated plans can have. We address this challenge by analyzing in how far it is possible to generalize existing approaches of machine ethics to automatic planning systems. Traditionally, ethical principles are formulated in an actionbased manner, allowing to judge the execution of one action. We show how such a judgment can be generalized to plans. Further, we study the computational complexity of making ethical judgment about plans.

We study prioritized planning for Multi-Agent Path Finding (MAPF). Existing prioritized MAPF algorithms depend on rule-of-thumb heuristics and random assignment to determine a fixed total priority ordering of all agents a priori. We instead explore the space of all possible partial priority orderings as part of a novel systematic and conflict-driven combinatorial search framework. In a variety of empirical comparisons, we demonstrate state-of-the-art solution qualities and success rates, often with similar runtimes to existing algorithms. We also develop new theoretical results that explore the limitations of prioritized planning, in terms of completeness and optimality, for the first time.

The Multi-Agent Pickup and Delivery (MAPD) problem models applications where a large number of agents attend to a stream of incoming pickup-and-delivery tasks. Token Passing (TP) is a recent MAPD algorithm that is efficient and effective. We make TP even more efficient and effective by using a novel combinatorial search algorithm, called Safe Interval Path Planning with Reservation Table (SIPPwRT), for single-agent path planning. SIPPwRT uses an advanced data structure that allows for fast updates and lookups of the current paths of all agents in an online setting. The resulting MAPD algorithm TP-SIPPwRT takes kinematic constraints of real robots into account directly during planning, computes continuous agent movements with given velocities that work on non-holonomic robots rather than discrete agent movements with uniform velocity, and is complete for wellformed MAPD instances. We demonstrate its benefits for automated warehouses using both an agent simulator and a standard robot simulator. For example, we demonstrate that it can compute paths for hundreds of agents and thousands of tasks in seconds and is more efficient and effective than existing MAPD algorithms that use a post-processing step to adapt their paths to continuous agent movements with given velocities.

Most real-world problems have huge state and/or action spaces. Therefore, a naive application of existing tabular solution methods is not tractable on such problems. Nonetheless, these solution methods are quite useful if an agent has access to a relatively small state-action space homomorphism of the true environment and near-optimal performance is guaranteed by the map. A plethora of research is focused on the case when the homomorphism is a Markovian representation of the underlying process. However, we show that nearoptimal performance is sometimes guaranteed even if the homomorphism is non-Markovian.

Graph coloring is one of the most famous computational problems with applications in a wide range of areas such as planning and scheduling, resource allocation, and pattern matching. So far coloring problems are mostly studied on static graphs, which often stand in stark contrast to practice where data is inherently dynamic and subject to discrete changes over time. A temporal graph is a graph whose edges are assigned a set of integer time labels, indicating at which discrete time steps the edge is active. In this paper we present a natural temporal extension of the classical graph coloring problem. Given a temporal graph and a natural number ∆, we ask for a coloring sequence for each vertex such that (i) in every sliding time window of ∆ consecutive time steps, in which an edge is active, this edge is properly colored (i.e. its endpoints are assigned two different colors) at least once during that time window, and (ii) the total number of different colors is minimized. This sliding window temporal coloring problem abstractly captures many realistic graph coloring scenarios in which the underlying network changes over time, such as dynamically assigning communication channels to moving agents. We present a thorough investigation of the computational complexity of this temporal coloring problem. More specifically, we prove strong computational hardness results, complemented by efficient exact and approximation algorithms. Some of our algorithms are linear-time fixed-parameter tractable with respect to appropriate parameters, while others are asymptotically almost optimal under the Exponential Time Hypothesis (ETH).

In several industrial applications of planning, complex temporal metric trajectory constraints are needed to adequately model the problem at hand. For example, in production plants, items must be processed following a “recipe” of steps subject to precise timing constraints. Modeling such domains is very challenging in existing action-based languages due to the lack of sufficiently expressive trajectory constraints. We propose a novel temporal planning formalism allowing quantified temporal constraints over execution timing of action instances. We build on top of instantaneous actions borrowed from classical planning and add expressive temporal constructs. The paper details the semantics of our new formalism and presents a solving technique grounded in classical, heuristic forward search planning. Our experiments prove the proposed framework superior to alternative state-of-theart planning approaches on industrial benchmarks, and competitive with similar solving methods on well known benchmarks took from the planning competition.

Designing multi-agent systems, where several agents work in a shared environment, requires coordinating between the agents so they do not interfere with each other. One of the canonical approaches to coordinating agents is enacting a social law, which applies restrictions on agents’ available actions. A good social law prevents the agents from interfering with each other, while still allowing all of them to achieve their goals. Recent work took the first step towards reasoning about social laws using automated planning and showed how to verify if a given social law is robust, that is, allows all agents to achieve their goals regardless of what the other agents do. This work relied on a classical planning formalism, which assumed actions are instantaneous and some external scheduler chooses which agent acts next. However, this work is not directly applicable to multi-robot systems, because in the real world actions take time and the agents can act concurrently. In this paper, we show how the robustness of a social law in a continuous time setting can be verified through compilation to temporal planning. We demonstrate our work both theoretically and on real robots.

The most common representation formalisms for planning are descriptive models. They abstractly describe what the actions do and are tailored for efficiently computing the next state(s) in a state transition system. But acting requires operational models that describe how to do things, with rich control structures for closed-loop online decision-making. Using descriptive representations for planning and operational representations for acting can lead to problems with developing and verifying consistency of the different models. We define and implement an integrated acting-and-planning system in which both planning and acting use the same operational models, which are written in a general-purpose hierarchical task-oriented language offering rich control structures. The acting component is inspired by the well-known PRS system, except that instead of being purely reactive, it can get advice from the planner. Our planning algorithm, RAEplan, plans by doing Monte Carlo rollout simulations of the actor’s operational models. Our experiments show significant benefits in the efficiency of the acting and planning system.

Supervised learning methods have been widely applied to activity recognition. The prevalent success of existing methods, however, has two crucial prerequisites: proper feature extraction and sufficient labeled training data. The former is important to differentiate activities, while the latter is crucial to build a precise learning model. These two prerequisites have become bottlenecks to make existing methods more practical. Most existing feature extraction methods highly depend on domain knowledge, while labeled data requires intensive human annotation effort. Therefore, in this paper, we propose a novel method, named Distribution-based Semi-Supervised Learning, to tackle the aforementioned limitations. The proposed method is capable of automatically extracting powerful features with no domain knowledge required, meanwhile, alleviating the heavy annotation effort through semi-supervised learning. Specifically, we treat data stream of sensor readings received in a period as a distribution, and map all training distributions, including labeled and unlabeled, into a reproducing kernel Hilbert space (RKHS) using the kernel mean embedding technique. The RKHS is further altered by exploiting the underlying geometry structure of the unlabeled distributions. Finally, in the altered RKHS, a classifier is trained with the labeled distributions. We conduct extensive experiments on three public datasets to verify the effectiveness of our method compared with state-of-the-art baselines.