| Total: 40

We present a simple and concise semantics for temporal planning. Our semantics are developed and formalised in the logic of the interactive theorem prover Isabelle/HOL. We derive from those semantics a validation algorithm for temporal planning and show, using a formal proof in Isabelle/HOL, that this validation algorithm implements our semantics. We experimentally evaluate our verified validation algorithm and show that it is practical.

Most approaches for goal recognition rely on specifications of the possible dynamics of the actor in the environment when pursuing a goal. These specifications suffer from two key issues. First, encoding these dynamics requires careful design by a domain expert, which is often not robust to noise at recognition time. Second, existing approaches often need costly real-time computations to reason about the likelihood of each potential goal. In this paper, we develop a framework that combines model-free reinforcement learning and goal recognition to alleviate the need for careful, manual domain design, and the need for costly online executions. This framework consists of two main stages: Offline learning of policies or utility functions for each potential goal, and online inference. We provide a first instance of this framework using tabular Q-learning for the learning stage, as well as three measures that can be used to perform the inference stage. The resulting instantiation achieves state-of-the-art performance against goal recognizers on standard evaluation domains and superior performance in noisy environments.

In the online (time-series) search problem, a player is presented with a sequence of prices which are revealed in an online manner. In the standard definition of the problem, for each revealed price, the player must decide irrevocably whether to accept or reject it, without knowledge of future prices (other than an upper and a lower bound on their extreme values), and the objective is to minimize the competitive ratio, namely the worst case ratio between the maximum price in the sequence and the one selected by the player. The problem formulates several applications of decision-making in the face of uncertainty on the revealed samples. Previous work on this problem has largely assumed extreme scenarios in which either the player has almost no information about the input, or the player is provided with some powerful, and error-free advice. In this work, we study learning-augmented algorithms, in which there is a potentially erroneous prediction concerning the input. Specifically, we consider two different settings: the setting in which the prediction is related to the maximum price in the sequence, as well as well as the setting in which the prediction is obtained as a response to a number of binary queries. For both settings, we provide tight, or near-tight upper and lower bounds on the worst-case performance of search algorithms as a function of the prediction error. We also provide experimental results on data obtained from stock exchange markets that confirm the theoretical analysis, and explain how our techniques can be applicable to other learning-augmented applications.

Goal recognition design (GRD) is the task of modifying environments for aiding observers to recognize the objectives of agents during online observations. The worst case distinctiveness (WCD), a widely used performance measure in GRD research, can fail to provide useful guidance to the redesign process when some goals are too hard to be distinguished. Moreover, the existing WCD-based approaches do not work when an agent aims for a sequence of goals instead of just one goal. The paper presents a new GRD framework called extended goal recognition design (EGRD) for goal recognition that involves multiple goals. The objective of EGRD is to modify an environment to minimize the worst case distinctiveness of a goal condition that describes how an agent can reach a set of goals. A goal condition can be formally expressed in first-order computation tree logic (FO-CTL) that can be evaluated by model checking. We introduce a novel graphical representation of FO-CTL sentences that is suitable for extended goal recognition. Moreover, we present a search algorithm for EGRD with a novel caching mechanism. Our experimental results show that the caching mechanism can greatly speed up our EGRD search algorithm by reusing the previous evaluation of FO-CTL sentences.

Controllers for autonomous systems that operate in safety-critical settings must account for stochastic disturbances. Such disturbances are often modeled as process noise, and common assumptions are that the underlying distributions are known and/or Gaussian. In practice, however, these assumptions may be unrealistic and can lead to poor approximations of the true noise distribution. We present a novel planning method that does not rely on any explicit representation of the noise distributions. In particular, we address the problem of computing a controller that provides probabilistic guarantees on safely reaching a target. First, we abstract the continuous system into a discrete-state model that captures noise by probabilistic transitions between states. As a key contribution, we adapt tools from the scenario approach to compute probably approximately correct (PAC) bounds on these transition probabilities, based on a finite number of samples of the noise. We capture these bounds in the transition probability intervals of a so-called interval Markov decision process (iMDP). This iMDP is robust against uncertainty in the transition probabilities, and the tightness of the probability intervals can be controlled through the number of samples. We use state-of-the-art verification techniques to provide guarantees on the iMDP, and compute a controller for which these guarantees carry over to the autonomous system. Realistic benchmarks show the practical applicability of our method, even when the iMDP has millions of states or transitions.

Reactive synthesis from high-level specifications that combine hard constraints expressed in Linear Temporal Logic (LTL) with soft constraints expressed by discounted sum (DS) rewards has applications in planning and reinforcement learning. An existing approach combines techniques from LTL synthesis with optimization for the DS rewards but has failed to yield a sound algorithm. An alternative approach combining LTL synthesis with satisficing DS rewards (rewards that achieve a threshold) is sound and complete for integer discount factors, but, in practice, a fractional discount factor is desired. This work extends the existing satisficing approach, presenting the first sound algorithm for synthesis from LTL and DS rewards with fractional discount factors. The utility of our algorithm is demonstrated on robotic planning domains.

Translation-based approaches to planning allow for solving problems in complex and expressive formalisms via the means of highly efficient solvers for simpler formalisms. To be effective, these translations have to be constructed appropriately. The current existing translation of the highly expressive formalism of HTN planning into the more simple formalism of classical planning is not on par with the performance of current dedicated HTN planners. With our contributions in this paper, we close this gap: we describe new versions of the translation that reach the performance of state-of-the-art dedicated HTN planners. We present new translation techniques both for the special case of totally-ordered HTNs as well as for the general partially-ordered case. In the latter, we show that our new translation generates only linearly many actions, while the previous encoding generates and exponential number of actions.

For users to trust planning algorithms, they must be able to understand the planner's outputs and the reasons for each action selection. This output does not tend to be user-friendly, often consisting of sequences of parametrised actions or task networks. And these may not be practical for non-expert users who may find it easier to read natural language descriptions. In this paper, we propose PlanVerb, a domain and planner-independent method for the verbalization of task plans. It is based on semantic tagging of actions and predicates. Our method can generate natural language descriptions of plans including causal explanations. The verbalized plans can be summarized by compressing the actions that act on the same parameters. We further extend the concept of verbalization space, previously applied to robot navigation, and apply it to planning to generate different kinds of plan descriptions for different user requirements. Our method can deal with PDDL and RDDL domains, provided that they are tagged accordingly. Our user survey evaluation shows that users can read our automatically generated plan descriptions and that the explanations help them answer questions about the plan.

Effective decision making while competing for limited resources in adversarial environments is important for many real-world applications (e.g. two Taxi companies competing for customers). Decision-making techniques such as Automated planning have to take into account possible actions of adversary (or competing) agents. That said, the agent should know what the competitor will likely do and then generate its plan accordingly. In this paper we propose a novel approach for estimating strategies of the adversary (or the competitor), sampling its actions that might hinder agent's goals by interfering with the agent's actions. The estimated competitor strategies are used in plan generation such that agent's actions have to be applied prior to the ones of the competitor, whose estimated times dictate the deadlines. We empirically evaluate our approach leveraging sampling of competitor's actions by comparing it to the naive approach optimising the make-span (not taking the competing agent into account at all) and to Nash Equilibrium (mixed) strategies.

Heuristics for lifted planning are not yet as informed as the best heuristics for ground planning. Recent work introduced the idea of using Datalog programs to compute the additive heuristic over lifted tasks. Based on this work, we show how to compute the more informed FF heuristic in a lifted manner. We extend the Datalog program with executable annotations that can also be used to define other delete-relaxation heuristics. In our experiments, we show that a planner using the lifted FF implementation produces state-of-the-art results for lifted planners. It also reduces the gap to state-of-the-art ground planners in domains where grounding is feasible.

One of the most widespread human behavioral biases is the present bias -- the tendency to overestimate current costs by a bias factor. Kleinberg and Oren (2014) introduced an elegant graph-theoretical model of inconsistent planning capturing the behavior of a present-biased agent accomplishing a set of actions. The essential measure of the system introduced by Kleinberg and Oren is the cost of irrationality -- the ratio of the total cost of the actions performed by the present-biased agent to the optimal cost. This measure is vital for a task designer to estimate the aftermaths of human behavior related to time-inconsistent planning, including procrastination and abandonment. As we prove in this paper, the cost of irrationality is highly susceptible to the agent's choices when faced with a few possible actions of equal estimated costs. To address this issue, we propose a modification of Kleinberg-Oren's model of inconsistent planning. In our model, when an agent selects from several options of minimum prescribed cost, he uses a randomized procedure. We explore the algorithmic complexity of computing and estimating the cost of irrationality in the new model.

Exploring unknown environments is a fundamental task in many domains, e.g., robot navigation, network security, and internet search. We initiate the study of a learning-augmented variant of the classical, notoriously hard online graph exploration problem by adding access to machine-learned predictions. We propose an algorithm that naturally integrates predictions into the well-known Nearest Neighbor (NN) algorithm and significantly outperforms any known online algorithm if the prediction is of high accuracy while maintaining good guarantees when the prediction is of poor quality. We provide theoretical worst-case bounds that gracefully degrade with the prediction error, and we complement them by computational experiments that confirm our results. Further, we extend our concept to a general framework to robustify algorithms. By interpolating carefully between a given algorithm and NN, we prove new performance bounds that leverage the individual good performance on particular inputs while establishing robustness to arbitrary inputs.

Since no classical planner consistently outperforms all others, it is important to select a planner that works well for a given classical planning task. The two strongest approaches for planner selection use image and graph convolutional neural networks. They have the drawback that the learned models are complicated and uninterpretable. To obtain explainable models, we identify a small set of simple task features and show that elementary and interpretable machine learning techniques can use these features to solve roughly as many tasks as the complex approaches based on neural networks.

Symbolic search, using Binary Decision Diagrams (BDDs) to represent sets of states, is a competitive approach to optimal planning. Yet heuristic search in this context remains challenging. The many advances on admissible planning heuristics are not directly applicable, as they evaluate one state at a time. Indeed, progress using heuristic functions in symbolic search has been limited and even very informed heuristics have been shown to be detrimental. Here we show how this connection can be made stronger for LP-based potential heuristics. Our key observation is that, for this family of heuristic functions, the change of heuristic value induced by each operator can be precomputed. This facilitates their smooth integration into symbolic search. Our experiments show that this can pay off significantly: we establish a new state of the art in optimal symbolic planning.

Reconfiguring two shortest paths in a graph means modifying one shortest path to the other by changing one vertex at a time, so that all the intermediate paths are also shortest paths. This problem has several natural applications, namely: (a) revamping road networks, (b) rerouting data packets in a synchronous multiprocessing setting, (c) the shipping container stowage problem, and (d) the train marshalling problem. When modelled as graph problems, (a) is the most general case while (b), (c) and (d) are restrictions to different graph classes. We show that (a) is intractable, even for relaxed variants of the problem. For (b), (c) and (d), we present efficient algorithms to solve the respective problems. We also generalise the problem to when at most k (for some k >= 2) contiguous vertices on a shortest path can be changed at a time.

Classical planning tasks are modelled in PDDL which is a schematic language based on first-order logic. Most of the current planners turn this lifted representation into a propositional one via a grounding process. However, grounding may cause an exponential blowup. Therefore it is important to investigate methods for searching for plans on the lifted level. To build a lifted state-based planner, it is necessary to invent lifted heuristics. We introduce maps between PDDL tasks preserving plans allowing to transform a PDDL task into a smaller one. We propose a novel method for computing lifted (admissible) delete-free relaxed heuristics via grounding of the smaller task and computing the (admissible) delete-free relaxed heuristics there. This allows us to transfer the knowledge about relaxed heuristics from the grounded level to the lifted level.

A Simple Temporal Network with Uncertainty (STNU) includes real-valued variables, called time-points; binary difference constraints on those time-points; and contingent links that represent actions with uncertain durations. STNUs have been used for robot control, web-service composition, and business processes. The most important property of an STNU is called dynamic controllability (DC); and algorithms for checking this property are called DC-checking algorithms. The DC-checking algorithm for STNUs with the best worst-case time-complexity is the RUL¯ algorithm due to Cairo, Hunsberger and Rizzi. Its complexity is O(mn + k²n + kn log n), where n is the number of time-points, m is the number of constraints, and k is the number of contingent links. It is expected that this worst-case complexity cannot be improved upon. However, this paper provides a new algorithm, called RUL2021, that improves its performance in practice by an order of magnitude, as demonstrated by a thorough empirical evaluation.

Recent deep models for solving routing problems always assume a single distribution of nodes for training, which severely impairs their cross-distribution generalization ability. In this paper, we exploit group distributionally robust optimization (group DRO) to tackle this issue, where we jointly optimize the weights for different groups of distributions and the parameters for the deep model in an interleaved manner during training. We also design a module based on convolutional neural network, which allows the deep model to learn more informative latent pattern among the nodes. We evaluate the proposed approach on two types of well-known deep models including GCN and POMO. The experimental results on the randomly synthesized instances and the ones from two benchmark dataset (i.e., TSPLib and CVRPLib) demonstrate that our approach could significantly improve the cross-distribution generalization performance over the original models.

We consider the problem of learning action models for planning in unknown stochastic environments that can be defined using the Probabilistic Planning Domain Description Language (PPDDL). As input, we are given a set of previously executed trajectories, and the main challenge is to learn an action model that has a similar goal achievement probability to the policies used to create these trajectories. To this end, we introduce a variant of PPDDL in which there is uncertainty about the transition probabilities, specified by an interval for each factor that contains the respective true transition probabilities. Then, we present SAM+, an algorithm that learns such an imprecise-PPDDL environment model. SAM+ has a polynomial time and sample complexity, and guarantees that with high probability, the true environment is indeed captured by the defined intervals. We prove that the action model SAM+ outputs has a goal achievement probability that is almost as good or better than that of the policies used to produced the training trajectories. Then, we show how to produce a PPDDL model based on this imprecise-PPDDL model that has similar properties.

Diverse planning is an important problem in automated planning with many real world applications. Recently, diverse planning has seen renewed interest, with work that defines a taxonomy of computational problems with respect to both plan quality and solution diversity. However, despite the recent advances in diverse planning, the variety of approaches and the number of available planners are still quite limited, even nonexistent for several computational problems. In this work, we aim to extend the portfolio of planners for various computational problems in diverse planning. To that end, we introduce a novel approach to finding solutions for three computational problems within diverse planning and present planners for these three problems. For one of these problems, our approach is the first one that is able to provide solutions to the problem. For another, we show that top-k and top quality planners can provide, albeit naive, solutions to the problem and we extend these planners to improve the diversity of the solution. Finally, for the third problem, we show that some existing diverse planners already provide solutions to that problem. We suggest another approach and empirically show it to compare favorably with these existing planners.

Oversubscription planning (OSP) is the problem of finding plans that maximize the utility value of their end state while staying within a specified cost bound. Recently, it has been shown that OSP problems can be reformulated as classical planning problems with multiple cost functions but no utilities. Here we take advantage of this reformulation to show that OSP problems can be solved optimally using the A* search algorithm, in contrast to previous approaches that have used variations on branch-and-bound search. This allows many powerful techniques developed for classical planning to be applied to OSP problems. We also introduce novel bound-sensitive heuristics, which are able to reason about the primary cost of a solution while taking into account secondary cost functions and bounds, to provide superior guidance compared to heuristics that do not take these bounds into account. We propose two such bound-sensitive variants of existing classical planning heuristics, and show experimentally that the resulting search is significantly more informed than with comparable heuristics that do not consider bounds.

Integer programs provide a powerful abstraction for representing a wide range of real-world scheduling problems. Despite their ability to model general scheduling problems, solving large-scale integer programs (IP) remains a computational challenge in practice. The incorporation of more complex objectives such as robustness to disruptions further exacerbates the computational challenge. We present NICE (Neural network IP Coefficient Extraction), a novel technique that combines reinforcement learning and integer programming to tackle the problem of robust scheduling. More specifically, NICE uses reinforcement learning to approximately represent complex objectives in an integer programming formulation. We use NICE to determine assignments of pilots to a flight crew schedule so as to reduce the impact of disruptions. We compare NICE with (1) a baseline integer programming formulation that produces a feasible crew schedule, and (2) a robust integer programming formulation that explicitly tries to minimize the impact of disruptions. Our experiments show that, across a variety of scenarios, NICE produces schedules resulting in 33% to 48% fewer disruptions than the baseline formulation. Moreover, in more severely constrained scheduling scenarios in which the robust integer program fails to produce a schedule within 90 minutes, NICE is able to build robust schedules in less than 2 seconds on average.

In sequential decision making, objective specifications are often underspecified or incomplete, neglecting to take into account potential (negative) side effects. Executing plans without consideration of their side effects can lead to catastrophic outcomes -- a concern recently raised in relation to the safety of AI. In this paper we investigate how to avoid side effects in a symbolic planning setting. We study the notion of minimizing side effects in the context of a planning environment where multiple independent agents co-exist. We define (classes of) negative side effects in terms of their effect on the agency of those other agents. Finally, we show how plans which minimize side effects of different types can be computed via compilations to cost-optimizing symbolic planning, and investigate experimentally.

Recent advances in deep learning have enabled optimization of deep reactive policies (DRPs) for continuous MDP planning by encoding a parametric policy as a deep neural network and exploiting automatic differentiation in an end-to-end model-based gradient descent framework. This approach has proven effective for optimizing DRPs in nonlinear continuous MDPs, but it requires a large number of sampled trajectories to learn effectively and can suffer from high variance in solution quality. In this work, we revisit the overall model-based DRP objective and instead take a minorization-maximization perspective to iteratively optimize the DRP w.r.t. a locally tight lower-bounded objective. This novel formulation of DRP learning as iterative lower bound optimization (ILBO) is particularly appealing because (i) each step is structurally easier to optimize than the overall objective, (ii) it guarantees a monotonically improving objective under certain theoretical conditions, and (iii) it reuses samples between iterations thus lowering sample complexity. Empirical evaluation confirms that ILBO is significantly more sample-efficient than the state-of-the-art DRP planner and consistently produces better solution quality with lower variance. We additionally demonstrate that ILBO generalizes well to new problem instances (i.e., different initial states) without requiring retraining.

Learning linear temporal logic on finite traces (LTLf) formulae aims to learn a target formula that characterizes the high-level behavior of a system from observation traces in planning. Existing approaches to learning LTLf formulae, however, can hardly learn accurate LTLf formulae from noisy data. It is challenging to design an efficient search mechanism in the large search space in form of arbitrary LTLf formulae while alleviating the wrong search bias resulting from noisy data. In this paper, we tackle this problem by bridging LTLf inference to GNN inference. Our key theoretical contribution is showing that GNN inference can simulate LTLf inference to distinguish traces. Based on our theoretical result, we design a GNN-based approach, GLTLf, which combines GNN inference and parameter interpretation to seek the target formula in the large search space. Thanks to the non-deterministic learning process of GNNs, GLTLf is able to cope with noise. We evaluate GLTLf on various datasets with noise. Our experimental results confirm the effectiveness of GNN inference in learning LTLf formulae and show that GLTLf is superior to the state-of-the-art approaches.