| Total: 22

We consider the setting in which executions of contract algorithms are scheduled in a processor so as to produce an interruptible system. Such algorithms offer a trade off between the quality of output and the available computation time, provided that the latter is known in advance. Previous work on this setting has provided strict performance guarantees for several variants of this setting, assuming that an interruption can occur arbitrarily ahead in the future. In practice, however, one expects that the schedule will reach a point beyond which further progress will only be marginal, hence it can be deemed complete. In this work we show how to optimize the time at which the system reaches a desired performance objective, while maintaining interruptible guarantees throughout the entire execution. The resulting schedule is provably optimal, and it guarantees that upon completion each individual contract algorithm has attained a predefined end guarantee.

Over the last years, several new approaches to Hierarchical Task Network (HTN) planning have been proposed that increased the overall performance of HTN planners. However, the focus has been on agile planning - on finding a solution as quickly as possible. Little work has been done on finding optimal plans. We show how the currently best-performing approach to HTN planning - the translation into propositional logic - can be utilised to find optimal plans. Such SAT-based planners usually bound the HTN problem to a certain depth of decomposition and then translate the problem into a propositional formula. To generate optimal plans, the length of the solution has to be bounded instead of the decomposition depth. We show the relationship between these bounds and how it can be handled algorithmically. Based on this, we propose an optimal SAT-based HTN planner and show that it performs favourably on a benchmark set.

Simple Temporal Networks with Uncertainty (STNUs) provide a useful formalism with which to reason about events and the temporal constraints that apply to them. STNUs are in particular notable because they facilitate reasoning over stochastic, or uncontrollable, actions and their corresponding durations. To evaluate the feasibility of a set of constraints associated with an STNU, one checks the network's \textit{dynamic controllability}, which determines whether an adaptive schedule can be constructed on-the-fly. Our work improves the runtime of checking the dynamic controllability of STNUs with integer bounds to O(min(mn, m sqrt(n) log N) + km + k^2n + kn log n). Our approach pre-processes the STNU using an existing O(n^3) dynamic controllability checking algorithm and provides tighter bounds on its runtime. This makes our work easily adaptable to other algorithms that rely on checking variants of dynamic controllability.

We introduce and study Regular Decision Processes (RDPs), a new, compact, factored model for domains with non-Markovian dynamics and rewards. In RDPs, transition and reward functions are specified using formulas in linear dynamic logic over finite traces, a language with the expressive power of regular expressions. This allows specifying complex dependence on the past using intuitive and compact formulas, and provides a model that generalizes MDPs and k-order MDPs. RDPs can also approximate POMDPs without having to postulate the existence of hidden variables, and, in principle, can be learned from observations only.

We are concerned with the synthesis of strategies for sequential decision-making in non-deterministic dynamical environments where the objective is to satisfy a prescribed temporally extended goal. We frame this task as a Fully Observable Non-Deterministic planning problem with the goal expressed in Linear Temporal Logic (LTL), or LTL interpreted over finite traces (LTLf). While the problem is well-studied theoretically, existing algorithmic solutions typically compute so-called strong-cyclic solutions, which are predicated on an assumption of fairness. In this paper we introduce novel algorithms to compute so-called strong solutions, that guarantee goal satisfaction even in the absence of fairness. Our strategy generation algorithms are complemented with novel mechanisms to obtain proofs of unsolvability. We implemented and evaluated the performance of our approaches in a selection of domains with LTL and LTLf goals.

We study strategy synthesis for partially observable Markov decision processes (POMDPs). The particular problem is to determine strategies that provably adhere to (probabilistic) temporal logic constraints. This problem is computationally intractable and theoretically hard. We propose a novel method that combines techniques from machine learning and formal verification. First, we train a recurrent neural network (RNN) to encode POMDP strategies. The RNN accounts for memory-based decisions without the need to expand the full belief space of a POMDP. Secondly, we restrict the RNN-based strategy to represent a finite-memory strategy and implement it on a specific POMDP. For the resulting finite Markov chain, efficient formal verification techniques provide provable guarantees against temporal logic specifications. If the specification is not satisfied, counterexamples supply diagnostic information. We use this information to improve the strategy by iteratively training the RNN. Numerical experiments show that the proposed method elevates the state of the art in POMDP solving by up to three orders of magnitude in terms of solving times and model sizes.

Online planning methods for partially observable Markov decision processes (POMDPs) have recently gained much interest. In this paper, we propose the introduction of prior knowledge in the form of (probabilistic) relationships among discrete state-variables, for online planning based on the well-known POMCP algorithm. In particular, we propose the use of hard constraint networks and probabilistic Markov random fields to formalize state-variable constraints and we extend the POMCP algorithm to take advantage of these constraints. Results on a case study based on Rocksample show that the usage of this knowledge provides significant improvements to the performance of the algorithm. The extent of this improvement depends on the amount of knowledge encoded in the constraints and reaches the 50% of the average discounted return in the most favorable cases that we analyzed.

This paper revisits probabilistic, model-based goal recognition to study the implications of the use of nominal models to estimate the posterior probability distribution over a finite set of hypothetical goals. Existing model-based approaches rely on expert knowledge to produce symbolic descriptions of the dynamic constraints domain objects are subject to, and these are assumed to produce correct predictions. We abandon this assumption to consider the use of nominal models that are learnt from observations on transitions of systems with unknown dynamics. Leveraging existing work on the acquisition of domain models via learning for Hybrid Planning we adapt and evaluate existing goal recognition approaches to analyze how prediction errors, inherent to system dynamics identification and model learning techniques have an impact over recognition error rates.

Generalized planning aims at computing solutions that work for all instances of the same domain. In this paper, we show that several interesting planning domains possess compact generalized heuristics that can guide a greedy search in guaranteed polynomial time to the goal, and which work for any instance of the domain. These heuristics are weighted sums of state features that capture the number of objects satisfying a certain first-order logic property in any given state. These features have a meaningful interpretation and generalize naturally to the whole domain. Additionally, we present an approach based on mixed integer linear programming to compute such heuristics automatically from the observation of small training instances. We develop two variations of the approach that progressively refine the heuristic as new states are encountered. We illustrate the approach empirically on a number of standard domains, where we show that the generated heuristics will correctly generalize to all possible instances.

We propose an approach to general subgoal-based temporal abstraction in MCTS. Our approach approximates a set of available macro-actions locally for each state only requiring a generative model and a subgoal predicate. For that, we modify the expansion step of MCTS to automatically discover and optimize macro-actions that lead to subgoals. We empirically evaluate the effectiveness, computational efficiency and robustness of our approach w.r.t. different parameter settings in two benchmark domains and compare the results to standard MCTS without temporal abstraction.

We consider mechanisms for the online allocation of perishable resources such as energy or computational power. A main application is electric vehicle charging where agents arrive and leave over time. Unlike previous work, we consider mechanisms without money, and a range of objectives including fairness and efficiency. In doing so, we extend the concept of envy-freeness to online settings. Furthermore, we explore the trade-offs between different objectives and analyse their theoretical properties both in online and offline settings. We then introduce novel online scheduling algorithms and compare them in terms of both their theoretical properties and empirical performance.

We introduce a dynamic logic with parallel composition and two kinds of nondeterministic composition, exclusive and inclusive. We show PSPACE completeness of both the model checking and the satisfiability problem and apply our logic to sequential and parallel classical planning where actions have conditional effects.

Partially Observable Markov Decision Process (POMDP) is a fundamental framework for planning and decision making under uncertainty. POMDP is known to be intractable to solve or even approximate when the planning horizon is long (i.e., within a polynomial number of time steps). Constrained POMDP (C-POMDP) allows constraints to be specified on some aspects of the policy in addition to the objective function. When the constraints involve bounding the probability of failure, the problem is called Chance-Constrained POMDP (CC-POMDP). Our first contribution is a reduction from CC-POMDP to C-POMDP and a novel Integer Linear Programming (ILP) formulation. Thus, any algorithm for the later problem can be utilized to solve any instance of the former. Second, we show that unlike POMDP, when the length of the planning horizon is constant, (C)C-POMDP is NP-Hard. Third, we present the first Fully Polynomial Time Approximation Scheme (FPTAS) that computes (near) optimal deterministic policies for constant-horizon (C)C-POMDP in polynomial time.

Temporal logics are useful for providing concise descriptions of system behavior, and have been successfully used as a language for goal definitions in task planning. Prior works on inferring temporal logic specifications have focused on "summarizing" the input dataset - i.e., finding specifications that are satisfied by all plan traces belonging to the given set. In this paper, we examine the problem of inferring specifications that describe temporal differences between two sets of plan traces. We formalize the concept of providing such contrastive explanations, then present BayesLTL - a Bayesian probabilistic model for inferring contrastive explanations as linear temporal logic (LTL) specifications. We demonstrate the robustness and scalability of our model for inferring accurate specifications from noisy data and across various benchmark planning domains.

Decomposition is a general principle in computational thinking, aiming at decomposing a problem instance into easier subproblems. Indeed, decomposing a transition system into a partitioned transition relation was critical to scaling BDD-based model checking to large state spaces. Since then, it has become a standard technique for dealing with related problems, such as Boolean synthesis. More recently, partitioning has begun to be explored in the synthesis of reactive systems. LTLf synthesis, a finite-horizon version of reactive synthesis with applications in areas such as robotics, seems like a promising candidate for partitioning techniques. After all, the state of the art is based on a BDD-based symbolic algorithm similar to those from model checking, and partitioning could be a potential solution to the current bottleneck of this approach, which is the construction of the state space. In this work, however, we expose fundamental limitations of partitioning that hinder its effective application to symbolic LTLf synthesis. We not only provide evidence for this fact through an extensive experimental evaluation, but also perform an in-depth analysis to identify the reason for these results. We trace the issue to an overall increase in the size of the explored state space, caused by an inability of partitioning to fully exploit state-space minimization, which has a crucial effect on performance. We conclude that more specialized decomposition techniques are needed for LTLf synthesis which take into account the effects of minimization.

We propose Stable Yet Memory Bounded Open-Loop (SYMBOL) planning, a general memory bounded approach to partially observable open-loop planning. SYMBOL maintains an adaptive stack of Thompson Sampling bandits, whose size is bounded by the planning horizon and can be automatically adapted according to the underlying domain without any prior domain knowledge beyond a generative model. We empirically test SYMBOL in four large POMDP benchmark problems to demonstrate its effectiveness and robustness w.r.t. the choice of hyperparameters and evaluate its adaptive memory consumption. We also compare its performance with other open-loop planning algorithms and POMCP.

Feature-engineering-based machine learning models and deep learning models have been explored for wearable-sensor-based human activity recognition. For both types of methods, one crucial research issue is how to extract proper features from the partitioned segments of multivariate sensor readings. Existing methods have different drawbacks: 1) feature-engineering-based methods are able to extract meaningful features, such as statistical or structural information underlying the segments, but usually require manual designs of features for different applications, which is time consuming, and 2) deep learning models are able to learn temporal and/or spatial features from the sensor data automatically, but fail to capture statistical information. In this paper, we propose a novel deep learning model to automatically learn meaningful features including statistical features, temporal features and spatial correlation features for activity recognition in a unified framework. Extensive experiments are conducted on four datasets to demonstrate the effectiveness of our proposed method compared with state-of-the-art baselines.

Pattern databases are the foundation of some of the strongest admissible heuristics for optimal classical planning. Experiments showed that the most informative way of combining information from multiple pattern databases is to use saturated cost partitioning. Previous work selected patterns and computed saturated cost partitionings over the resulting pattern database heuristics in two separate steps. We introduce a new method that uses saturated cost partitioning to select patterns and show that it outperforms all existing pattern selection algorithms.

Many real-world scheduling problems are characterized by uncertain parameters. In this paper, we study a classical parallel machine scheduling problem where the processing time of jobs is given by a normal distribution. The objective is to maximize the probability that jobs are completed before a given common due date. This study focuses on the computational aspect of this problem, and it proposes a Branch-and-Price approach for solving it. The advantage of our method is that it scales very well with the increasing number of machines and is easy to implement. Furthermore, we propose an efficient lower bound heuristics. The experimental results show that our method outperforms the existing approaches.

Pickup-and-Delivery (PD) problems consider routing vehicles to achieve a set of tasks related to ``Pickup'', and to ``Delivery''. Meanwhile these tasks might subject to Precedence Constraints (PDPC) or Time Windows (PDTW). PD is a variant to Vehicle Routing Problems (VRP), which have been extensively studied for decades. In the recent years, PD demonstrates its closer relevance to AI. With an awareness that few work has been dedicated so far in addressing where the tractability boundary line can be drawn for PD problems, we identify in this paper a set of highly restricted PD problems and prove their NP-completeness. Many problems from a multitude of applications and industry domains are general versions of PDPC. Thus this new result of NP-hardness, of PDPC, not only clarifies the computational complexity of these problems, but also sets up a firm base for the requirement on use of approximation or heuristics, as opposed to looking for exact but intractable algorithms for solving them. We move on to perform an empirical study to locate sources of intractability in PD problems. That is, we propose a local-search formalism and algorithm for solving PDPC problems in particular. Experimental results support strongly effectiveness and efficiency of the local-search. Using the local-search as a solver for randomly generated PDPC problem instances, we obtained interesting and potentially useful insights regarding computational hardness of PDPC and PD.

The performance of domain-independent planning systems heavily depends on how the planning task has been modeled. This makes task reformulation an important tool to get rid of unnecessary complexity and increase the robustness of planners with respect to the model chosen by the user. In this paper, we represent tasks as factored transition systems (FTS), and use the merge-and-shrink (M&S) framework for task reformulation for optimal and satisficing planning. We prove that the flexibility of the underlying representation makes the M&S reformulation methods more powerful than the counterparts based on the more popular finite-domain representation. We adapt delete-relaxation and M&S heuristics to work on the FTS representation and evaluate the impact of our reformulation.

In this paper, we introduce the Steady-State Policy Synthesis (SSPS) problem which consists of ﬁnding a stochastic decision-making policy that maximizes expected rewards while satisfying a set of asymptotic behavioral speciﬁcations. These speciﬁcations are determined by the steady-state probability distribution resulting from the Markov chain induced by a given policy. Since such distributions necessitate recurrence, we propose a solution which ﬁnds policies that induce recurrent Markov chains within possibly non-recurrent Markov Decision Processes (MDPs). The SSPS problem functions as a generalization of steady-state control, which has been shown to be in PSPACE. We improve upon this result by showing that SSPS is in P via linear programming. Our results are validated using CPLEX simulations on MDPs with over 10000 states. We also prove that the deterministic variant of SSPS is NP-hard.