| Total: 19

The aim of this work is to explain the observed behaviour of a hybrid system (HS). The explanation problem is cast as finding a trajectory of the HS that matches some observations. By using the formalism of hybrid automata (HA), we characterize the explanations as the language of a network of HA that comprises one automaton for the HS and another one for the observations, thus restricting the behaviour of the HS exclusively to trajectories that explain the observations. We observe that this problem corresponds to a reachability problem in model-checking, but that state-of-the-art model checkers struggle to find concrete trajectories. To overcome this issue we provide a formal mapping from HA to PDDL+ and show how to use an off-the-shelf automated planner. An experimental analysis over domains with piece-wise constant, linear and nonlinear dynamics reveals that the proposed PDDL+ approach is much more efficient than solving directly the explanation problem with model-checking solvers.

Bin packing is a classic optimization problem with a wide range of applications from load balancing to supply chain management. In this work, we study the online variant of the problem, in which a sequence of items of various sizes must be placed into a minimum number of bins of uniform capacity. The online algorithm is enhanced with a (potentially erroneous) prediction concerning the frequency of item sizes in the sequence. We design and analyze online algorithms with efficient tradeoffs between the consistency (i.e., the competitive ratio assuming no prediction error) and the robustness (i.e., the competitive ratio under adversarial error), and whose performance degrades near-optimally as a function of the prediction error. This is the first theoretical and experimental study of online bin packing in the realistic setting of learnable predictions. Previous work addressed only extreme cases with respect to the prediction error, and relied on overly powerful and error-free oracles.

Using machine-learned predictions to create algorithms with better approximation guarantees is a very fresh and active field. In this work, we study classic scheduling problems under the learning augmented setting. More specifically, we consider the problem of scheduling jobs with arbitrary release dates on a single machine and the problem of scheduling jobs with a common release date on multiple machines. Our objective is to minimize the sum of completion times. For both problems, we propose algorithms which use predictions for taking their decisions. Our algorithms are consistent -- i.e. when the predictions are accurate, the performances of our algorithms are close to those of an optimal offline algorithm--, and robust -- i.e. when the predictions are wrong, the performance of our algorithms are close to those of an online algorithm without predictions. In addition, we confirm the above theoretical bounds by conducting experimental evaluation comparing the proposed algorithms to the offline optimal ones for both the single and multiple machines settings.

Reasoning about uncertainty is vital in many real-life autonomous systems. However, current state-of-the-art planning algorithms either cannot reason about uncertainty explicitly, or do so with high computational burden. Here, we focus on making informed decisions efficiently, using reward functions that explicitly deal with uncertainty. We formulate an approximation, namely an abstract observation model, that uses an aggregation scheme to alleviate computational costs. We derive bounds on the expected information-theoretic reward function and, as a consequence, on the value function. We then propose a method to refine aggregation to achieve identical action selection in a fraction of the computational time.

Several hierarchical planning systems feature a rich level of language features making them capable of expressing real-world problems. One such feature that's used by several current planning systems is causal links, which are used to track search progress. The formalism combining Hierarchical Task Network (HTN) planning with these links known from Partial Order Causal Link (POCL) planning is often referred to as hybrid planning. In this paper we study the computational complexity of such hybrid planning problems. More specifically, we provide missing membership results to existing hardness proofs and thereby provide tight complexity bounds for all known subclasses of hierarchical planning problems. We also re-visit and correct a result from the literature for plan verification showing that it remains NP-complete even in the absence of a task hierarchy.

In automated planning the ability of expressing constraints on the structure of the desired plans is important to deal with solution quality, as well as to express control knowledge. In PDDL3, this is supported through state-trajectory constraints corresponding to a class of LTLf formulae. In this paper, first we introduce a formalism to express trajectory constraints over actions in the plan, rather than over traversed states; Then we investigate compilation-based methods to deal with such constraints in propositional planning, and propose a new simple effective method. Finally, we experimentally study the usefulness of our action-trajectory constraints as a tool to express control knowledge. The experimental results show that the performance of a classical planner can be significantly improved by exploiting knowledge expressed by action constraints and handled by our compilation method, while the same knowledge turns out to be less beneficial when specified as state constraints and handled by two state-of-the-art systems supporting state constraints.

We consider shared autonomy systems where multiple operators (AI and human), can interact with the environment, e.g. by controlling a robot. The decision problem for the shared autonomy system is to select which operator takes control at each timestep, such that a reward specifying the intended system behaviour is maximised. The performance of the human operator is influenced by unobserved factors, such as fatigue or skill level. Therefore, the system must reason over stochastic models of operator performance. We present a framework for stochastic operators in shared autonomy systems (SO-SAS), where we represent operators using rich, partially observable models. We formalise SO-SAS as a mixed-observability Markov decision process, where environment states are fully observable and internal operator states are hidden. We test SO-SAS on a simulated domain and a computer game, empirically showing it results in better performance compared to traditional formulations of shared autonomy systems.

Recent work suggests to explain trade-offs between soft-goals in terms of their conflicts, i.e., minimal unsolvable soft-goal subsets. But this does not explain the conflicts themselves: Why can a given set of soft-goals not be jointly achieved? Here we approach that question in terms of the underlying constraints on plans in the task at hand, namely resource availability and time windows. In this context, a natural form of explanation for a soft-goal conflict is a minimal constraint relaxation under which the conflict disappears (``if the deadline was 1 hour later, it would work''). We explore algorithms for computing such explanations. A baseline is to simply loop over all relaxed tasks and compute the conflicts for each separately. We improve over this by two algorithms that leverage information -- conflicts, reachable states -- across relaxed tasks. We show that these algorithms can exponentially outperform the baseline in theory, and we run experiments confirming that advantage in practice.

How can we plan efficiently in a large and complex environment when the time budget is limited? Given the original simulator of the environment, which may be computationally very demanding, we propose to learn online an approximate but much faster simulator that improves over time. To plan reliably and efficiently while the approximate simulator is learning, we develop a method that adaptively decides which simulator to use for every simulation, based on a statistic that measures the accuracy of the approximate simulator. This allows us to use the approximate simulator to replace the original simulator for faster simulations when it is accurate enough under the current context, thus trading off simulation speed and accuracy. Experimental results in two large domains show that when integrated with POMCP, our approach allows to plan with improving efficiency over time.

Long range space missions, such as Rosetta, require robust plans of data-acquisition activities and of the resulting data transfers. In this paper we revisit the problem of assigning priorities to data transfers in order to maximize safety margin of onboard memory. We propose a fast sweep algorithm to verify the feasibility of a given priority assignment and we introduce an efficient exact algorithm to assign priorities on a single downlink window. We prove that the problem is NP-hard for several windows, and we propose several randomized heuristics to tackle the general case. Our experimental results show that the proposed approaches are able to improve the plans computed for the real mission by the previously existing method, while the sweep algorithm yields drastic accelerations.

We consider the mobile robot path planning problem for a class of recurrent reachability objectives. These objectives are parameterized by the expected time needed to visit one position from another, the expected square of this time, and also the frequency of moves between two neighboring locations. We design an efficient strategy synthesis algorithm for recurrent reachability objectives and demonstrate its functionality on non-trivial instances.

This paper studies a novel planning problem for multiple agents that cannot share holding resources, named OTIMAPP (Offline Time-Independent Multi-Agent Path Planning). Given a graph and a set of start-goal pairs, the problem consists in assigning a path to each agent such that every agent eventually reaches their goal without blocking each other, regardless of how the agents are being scheduled at runtime. The motivation stems from the nature of distributed environments that agents take actions fully asynchronous and have no knowledge about those exact timings of other actors. We present solution conditions, computational complexity, solvers, and robotic applications.

Model-reconciliation explanation is a popular framework for generating explanations for planning problems. While the framework has been extended to multiple settings since its introduction for classical planning problems, there is little agreement on the computational complexity of generating minimal model reconciliation explanations in the basic setting. In this paper, we address this lacuna by introducing a decision-version of the model-reconciliation explanation generation problem and we show that it is Sigma-2-P Complete.

While state-of-the-art planning systems need a grounded (propositional) task representation, the input model is provided "lifted", specifying predicates and action schemas with variables over a finite object universe. The size of the grounded model is exponential in predicate/action-schema arity, limiting applicability to cases where it is small enough. Recent work has taken up this challenge, devising an effective lifted forward search planner as basis for lifted heuristic search, as well as a variety of lifted heuristic functions based on the delete relaxation. Here we add a novel family of lifted heuristic functions, based on landmarks. We design two methods for landmark extraction in the lifted setting. The resulting heuristics exhibit performance advantages over previous heuristics in several benchmark domains. Especially the combination with lifted delete relaxation heuristics to a LAMA-style planner yields good results, beating the previous state of the art in lifted planning.

We investigate an extended version of the classical ski-rental problem with multiple commodities. A customer uses a set of commodities altogether, and he/she needs to choose payment options to cover the usage of each commodity without the knowledge of the future. The payment options of each commodity include (1) renting: to pay for an on-demand usage and (2) buying: to pay for the lifetime usage. It is a novel extension of the classical ski-rental problem which deals with only one commodity. To address this problem, we propose a new online algorithm called the Multi-Object Break-Even (MOBE) algorithm and conduct competitive analysis. We show that the tight lower and upper bounds of MOBE algorithm's competitive ratio are e/e-1 and 2 respectively against adaptive adversary under arbitrary renting and buying prices. We further prove that MOBE algorithm is an optimal online algorithm if commodities have the same rent-to-buy ratio. Numerical results verify our theoretical conclusion and demonstrate the advantages of MOBE in a real-world scenario.

A large number of emergency humanitarian rescue demands in conflict areas around the world are accompanied by intentional, persistent and unpredictable attacks on rescuers and supplies. Unfortunately, existing work on humanitarian relief planning mostly ignores this challenge in reality resulting a parlous and short-sighted relief distribution plan to a large extent. To address this, we first propose an offline multi-stage optimization problem of emergency relief planning under intentional attacks, in which all parameters in the game between the rescuer and attacker are supposed to be known or predictable. Then, an online version of this problem is introduced to meet the need of online and irrevocable decision making when those parameters are revealed in an online fashion. To achieve a far-sighted emergency relief planning under attacks, we design an online learning approach which is proven to obtain a near-optimal solution of the offline problem when those online reveled parameters are i.i.d. sampled from an unknown distribution. Finally, extensive experiments on a real anti-Ebola relief planning case based on the data of Ebola outbreak and armed attacks in DRC Congo show the scalability and effectiveness of our approach.

A longstanding objective in classical planning is to synthesize policies that generalize across multiple problems from the same domain. In this work, we study generalized policy search-based methods with a focus on the score function used to guide the search over policies. We demonstrate limitations of two score functions --- policy evaluation and plan comparison --- and propose a new approach that overcomes these limitations. The main idea behind our approach, Policy-Guided Planning for Generalized Policy Generalization (PG3), is that a candidate policy should be used to guide planning on training problems as a mechanism for evaluating that candidate. Theoretical results in a simplified setting give conditions under which PG3 is optimal or admissible. We then study a specific instantiation of policy search where planning problems are PDDL-based and policies are lifted decision lists. Empirical results in six domains confirm that PG3 learns generalized policies more efficiently and effectively than several baselines.

Qualitative numeric planning (QNP) is classical planning extended with non-negative real variables that can be increased or decreased by some arbitrary amount. Existing approaches for solving QNP problems are exclusively based on compilation to fully observable nondeterministic planning (FOND) problems or FOND+ problems, i.e., FOND problems with explicit fairness assumptions. However, the FOND-compilation approaches suffer from some limitations, such as difficulties to generate all strong cyclic solutions for FOND problems or introducing a great many extra variables and actions. In this paper, we propose a simpler characterization of QNP solutions and a new approach to solve QNP problems based on directly searching for a solution, which is a closed and terminating subgraph that contains a goal node, in the AND/OR graphs induced by QNP problems. Moreover, we introduce a pruning strategy based on termination tests on subgraphs. We implemented a native solver DSET based on the proposed approach and compared the performance of it with that of the two compilation-based approaches. Experimental results show that DSET is faster than the FOND-compilation approach by one order of magnitude, and comparable with the FOND+-compilation approach.

A major challenge for ridesharing platforms is to guarantee profit and fairness simultaneously, especially in the presence of misaligned incentives of drivers and riders. We focus on the dispatching-pricing problem to maximize the total revenue while keeping both drivers and riders satisfied. We study the computational complexity of the problem, provide a novel two-phased pricing solution with revenue and fairness guarantees, extend it to stochastic settings and develop a dynamic (a.k.a., learning-while-doing) algorithm that actively collects data to learn the demand distribution during the scheduling process. We also conduct extensive experiments to demonstrate the effectiveness of our algorithms.