| Total: 30
Optimal heuristic search has been successful in many domains, including journey planning, route planning and puzzle solving. Existing work typically assumes that the cost of each action can easily be obtained. However, in many problems, the exact edge cost is expensive to compute. Existing search algorithms face a significant performance bottleneck, due to an excessive overhead associated with dynamically calculating exact edge costs. We present DEA*, an algorithm for problems with expensive edge cost computations. DEA* combines heuristic edge cost evaluations with delayed node expansions, reducing the number of exact edge computations. We formally prove that DEA* is optimal and it is efficient with respect to the number of exact edge cost computations. We empirically evaluate DEA* on multiple-worker routing problems where the exact edge cost is calculated by invoking an external multi-modal journey planning engine. The results demonstrate the effectiveness of our ideas in reducing the computational time and improving the solving ability. In addition, we show the advantages of DEA* in domain-independent planning, where we simulate that accurate edge costs are expensive to compute.
Omega-regular objectives in Markov decision processes (MDPs) reduce to reachability: find a policy which maximizes the probability of reaching a target set of states. Given an MDP, an initial distribution, and a target set of states, such a policy can be computed by most probabilistic model checking tools. If the MDP is only partially specified, i.e., some prob- abilities are unknown, then model-learning techniques can be used to statistically approximate the probabilities and enable the computation of the de- sired policy. For fully specified MDPs, reducing the size of the MDP translates into faster model checking; for partially specified MDPs, into faster learning. We provide reduction techniques that al- low us to remove irrelevant transition probabilities: transition probabilities (known, or to be learned) that do not influence the maximal reachability probability. Among other applications, these reductions can be seen as a pre-processing of MDPs before model checking or as a way to reduce the number of experiments required to obtain a good approximation of an unknown MDP.
In this paper, we focus on speeding up the temporal plan relaxation problem for dynamically controllable systems. We take a look at the current best-known algorithm for determining dynamic controllability and augment it to efficiently generate conflicts when the network is deemed uncontrollable. Our work preserves the O(n^3) runtime of the best available dynamic controllability checker and improves on the previous best runtime of O(n^4) for extracting dynamic controllability conflicts. We then turn our attention to temporal plan relaxation tasks and show how we can leverage our work on conflicts and the structure of the network to efficiently make incremental updates intended to restore dynamic controllability by relaxing constraints. Our new algorithm, RelaxIDC, has the same asymptotic runtime as previous algorithms but sees dramatic empirical improvements over the course of repeated dynamic controllability checks.
In many planning applications, actions can have highly diverse costs. Recent studies focus on the effects of diverse action costs on search algorithms, but not on their effects on domain-independent heuristics. In this paper, we demonstrate there are negative impacts of action cost diversity on merge-and-shrink (M&S), a successful abstraction method for producing high-quality heuristics for planning problems. We propose a new cost partitioning method for M&S to address the negative effects of diverse action costs. We investigate non-unit cost IPC domains, especially those for which diverse action costs have severe negative effects on the quality of the M&S heuristic. Our experiments demonstrate that in these domains, an additive set of M&S heuristics using the new cost partitioning method produces much more informative and effective heuristics than creating a single M&S heuristic which directly encodes diverse costs.
Classical planning is concerned with problems where a goal needs to be reached from a known initial state by doing actions with deterministic, known effects. Classical planners, however, deal only with classical problems that can be expressed in declarative planning languages such as STRIPS or PDDL. This prevents their use on problems that are not easy to model declaratively or whose dynamics are given via simulations. Simulators do not provide a declarative representation of actions, but simply return successor states. The question we address in this paper is: can a planner that has access to the structure of states and goals only, approach the performance of planners that also have access to the structure of actions expressed in PDDL? To answer this, we develop domain-independent, black box planning algorithms that completely ignore action structure, and show that they match the performance of state-of-the-art classical planners on the standard planning benchmarks. Effective black box algorithms open up new possibilities for modeling and for expressing control knowledge, which we also illustrate.
A pattern database (PDB) for a planning task is a heuristic function in the form of a lookup table that contains optimal solution costs of a simplified version of the task. In this paper we introduce a method that sequentially creates multiple PDBs which are later combined into a single heuristic function. At a given iteration, our method uses estimates of the A* running time to create a PDB that complements the strengths of the PDBs created in previous iterations. We evaluate our algorithm using explicit and symbolic PDBs. Our results show that the heuristics produced by our approach are able to outperform existing schemes, and that our method is able to create PDBs that complement the strengths of other existing heuristics such as a symbolic perimeter heuristic.
Star-topology decoupling is a recent search reduction method for forward state space search. The idea basically is to automatically identify a star factoring, then search only over the center component in the star, avoiding interleavings across leaf components. The framework can handle complex star topologies, yet prior work on decoupled search considered only factoring strategies identifying fork and inverted-fork topologies. Here, we introduce factoring strategies able to detect general star topologies, thereby extending the reach of decoupled search to new factorings and to new domains, sometimes resulting in significant performance improvements. Furthermore, we introduce a predictive portfolio method that reliably selects the most suitable factoring for a given planning task, leading to superior overall performance.
We propose a new method for conformant planning based on two ideas. First given a small sample of the initial belief state we reduce conformant planning for this sample to a classical planning problem, giving us a candidate solution. Second we exploit regression as a way to compactly represent necessary conditions for such a solution to be valid for the non-deterministic setting. If necessary, we use the resulting formula to extract a counter-example to populate our next sampling. Our experiments show that this approach is competitive on a class of problems that are hard for traditional planners, and also returns generally shorter plans. We are also able to demonstrate unsatisfiability of some problems.
We consider the problem of minimizing the the delay of jobs moving through a directed graph of service nodes. In this problem, each node may have several links and is constrained to serve one link at a time. As jobs move through the network, they can pass through a node only after they have been serviced by that node. The objective is to minimize the delay jobs incur sitting on queues waiting to be serviced. Two popular approaches to this problem are backpressure algorithm and schedule-driven control. In this paper, we present a hybrid approach of those two methods that incorporates the stability of queuing theory into the schedule-driven control. We then demonstrate how this hybrid method outperforms the other two in a real-time traffic signal control problem, where the nodes are traffic lights, the links are roads, and the jobs are vehicles. We show through simulations that, in scenarios with heavy congestion, the hybrid method results in 50% and 15% reductions in delay over schedule-driven control and backpressure respectively. A theoretical analysis also justifies our results.
Multi-robot navigation control in the absence of reference trajectory is rather challenging as it is expected to ensure stability and feasibility while still offer fast computation on control decisions. The intrinsic high complexity of switched linear dynamical robots makes the problem even more challenging. In this paper, we propose a novel HMPC based method to address the navigation problem of multiple robots with switched linear dynamics. We develop a new technique to compute the reachable sets of switched linear systems and use them to enable the parallel computation of control parameters. We present theoretical results on stability, feasibility and complexity of the proposed approach, and demonstrate its empirical advance in performance against other approaches.
The real-world application of planning techniques often requires models with numeric fluents. However, these fluents are not directly supported by most planners and heuristics. We describe a family of planning algorithms that takes a numeric planning problem and produces an abstracted representation that can be solved using any classical planner. The resulting abstract plan is generalized into a policy and then used to guide the search in the original numeric domain. We prove that our approach is sound, and we evaluate it on a set of standard benchmarks. We show that it can provide competitive performance when compared to other well-known algorithms for numeric planning, and a significant performance improvement in certain domains.
This paper proposes a novel direct policy search (DPS) method with model selection for partially observed Markov decision processes (POMDPs). DPSs have been standard for learning POMDPs due to their computational efficiency and natural ability to maximize total rewards. An important open challenge for the best use of DPS methods is model selection, i.e., determination of the proper dimensionality of hidden states and complexity of policy functions, to mitigate overfitting in highly-flexible model representations of POMDPs. This paper bridges Bayesian inference and reward maximization and derives marginalized weighted log-likelihood~(MWL) for POMDPs which takes both advantages of Bayesian model selection and DPS. Then we propose factorized asymptotic Bayesian policy search (FABPS) to explore the model and the policy which maximizes MWL by expanding recently-developed factorized asymptotic Bayesian inference. Experimental results show that FABPS outperforms state-of-the-art model selection methods for POMDPs, with respect both to model selection and to expected total rewards.
We present the Equi Reward Utility Maximizing Design (ER-UMD) problem for redesigning stochastic environments to maximize agent performance. ER-UMD fits well contemporary applications that require offline design of environments where robots and humans act and cooperate. To find an optimal modification sequence we present two novel solution techniques: a compilation that embeds design into a planning problem, allowing use of off-the-shelf solvers to find a solution, and a heuristic search in the modifications space, for which we present an admissible heuristic. Evaluation shows the feasibility of the approach using standard benchmarks from the probabilistic planning competition and a benchmark we created for a vacuum cleaning robot setting.
In this paper, we propose a novel integrated task planning system for service robot in domestic domains. Given open-ended high-level user instructions in natural language, robots need to generate a plan, i.e., a sequence of low-level executable actions, to complete the required tasks. To address this, we exploit the knowledge on semantic roles of common verbs defined in semantic dictionaries such as FrameNet and integrate it with Answer Set Programming --- a task planning framework with both representation language and solvers. In the experiments, we evaluated our approach using common benchmarks on service tasks and showed that it can successfully handle much more tasks than the state-of-the-art solution. Notably, we deployed the proposed planning system on our service robot for the annual RoboCup@Home competitions and achieved very encouraging results.
Deceptive path-planning involves finding a path such that the probability of an observer identifying the path's final destination - before it has been reached - is minimised. This paper formalises deception as it applies to path-planning and introduces the notion of a last deceptive point (LDP) which, when measured in terms of 'path completion', can be used to rank paths by their potential to deceive. Building on recent developments in probabilistic goal-recognition, we propose a formula to calculate an optimal LDP and present strategies for the generation of deceptive paths by both simulation ('showing the false') and dissimulation ('hiding the real').
A domain-independent heuristic function created by an abstraction is usually implemented using a Pattern Database (PDB), which is a lookup table of (abstract state, heuristic value) pairs. PDBs containing high quality heuristic values generally require substantial memory space and therefore need to be compressed. In this paper, we introduce Acyclic Random Hypergraph Compression (ARHC), a domain-independent approach to compressing PDBs using acyclic random r-partite r-uniform hypergraphs. The ARHC algorithm, which comes in Base and Extended versions, provides fast lookup and a high compression rate. ARHC-Extended achieves higher quality heuristics than ARHC-Base by decreasing the heuristic information loss at the cost of some decrease in the compression rate. ARHC shows higher performance than level-by-level Bloom filter PDB compression in all experiments conducted so far.
The paper generalises the notion of landmarks for reasoning about planning problems involving propositional and numeric variables. Intuitively, numeric landmarks are regions in the metric space defined by the problem whose crossing is necessary for its resolution. The paper proposes a relaxation-based method for their automated extraction directly from the problem structure, and shows how to exploit them to infer what we call disjunctive and additive hybrid action landmarks. The justification of such a disjunctive representation results from the intertwined propositional and numeric structure of the problem. The paper exercises their use in two novel admissible LP-Based numeric heuristics, and reports experiments on cost-optimal numeric planning problems. Results show the heuristics are more informed and effective than previous work for problems involving a higher number of (sub)goals.
This paper presents a novel approach for generating Context-Free Grammars (CFGs) from small sets of input strings (a single input string in some cases). Our approach is to compile this task into a classical planning problem whose solutions are sequences of actions that build and validate a CFG compliant with the input strings. In addition, we show that our compilation is suitable for implementing the two canonical tasks for CFGs, string production and string recognition.
A key technique for proving unsolvability in classical planning are dead-end detectors \Delta: effectively testable criteria sufficient for unsolvability, pruning (some) unsolvable states during search. Related to this, a recent proposal is the identification of traps prior to search, compact representations of non-goal state sets T that cannot be escaped. Here, we create new synergy across these ideas. We define a generalized concept of traps, relative to a given dead-end detector \Delta, where T can be escaped, but only into dead-end states detected by \Delta. We show how to learn compact representations of such T during search, extending the reach of \Delta. Our experiments show that this can be quite beneficial. It improves coverage for many unsolvable benchmark planning domains and dead-end detectors \Delta, in particular on resource-constrained domains where it outperforms the state of the art.
In this paper we explore the theoretical boundaries of planning in a setting where no model of the agent's actions is given. Instead of an action model, a set of successfully executed plans are given and the task is to generate a plan that is safe, i.e., guaranteed to achieve the goal without failing. To this end, we show how to learn a conservative model of the world in which actions are guaranteed to be applicable. This conservative model is then given to an off-the-shelf classical planner, resulting in a plan that is guaranteed to achieve the goal. However, this reduction from a model-free planning to a model-based planning is not complete: in some cases a plan will not be found even when such exists. We analyze the relation between the number of observed plans and the likelihood that our conservative approach will indeed fail to solve a solvable problem. Our analysis show that the number of trajectories needed scales gracefully.
This paper focuses on a generalization of the traveling salesman problem (TSP), called the subpath planning problem (SPP). Given 2n vertices and n independent edges on a metric space, we aim to find a shortest tour that contains all the edges. SPP is one of the fundamental problems in both artificial intelligence and robotics. Our main result is to design a 1.5-approximation algorithm that runs in polynomial time, improving the currently best approximation algorithm. The idea is direct use of techniques developed for TSP. In addition, we propose a generalization of SPP called the subgroup planning problem (SGPP). In this problem, we are given a set of disjoint groups of vertices, and we aim to find a shortest tour such that all the vertices in each group are traversed sequentially. We propose a 3-approximation algorithm for SGPP. We also conduct numerical experiments. Compared with previous algorithms, our algorithms improve the solution quality by more than 10% for large instances with more than 10,000 vertices.
With the rapid growth of e-commerce and World Wide Web, internet advertising revenue has surpassed broadcast revenue very recently. As online advertising has become a major source of revenue for online publishers, such as Google and Amazon, one problem facing them is to optimize the ads selection and allocation in order to maximize their revenue. Although there is a rich body of work that has been devoted to this field, uncertainty about models and parameter settings is largely ignored in existing algorithm design. To fill this gap, we are the first to formulate and study the \emph{Robust Ad Allocation} problem, by taking into account the uncertainty about parameter settings. We define a Robust Ad Allocation framework with a set of candidate parameter settings, typically derived from different users or topics. Our main aim is to develop robust ad allocation algorithms, which can provide satisfactory performance across a spectrum of parameter settings, compared to the (parameter-specific) optimum solutions. We study this problem progressively and propose a series of algorithms with bounded approximation ratio.
Dominance relations compare states to determine whether one is at least as good as another in terms of their goal distance. We generalize these qualitative yes/no relations to functions that measure by how much a state is better than another. This allows us to distinguish cases where the state is strictly closer to the goal. Moreover, we may obtain a bound on the difference in goal distance between two states even if there is no qualitative dominance.We analyze the multiple advantages that quantitative dominance has, like discovering coarser dominance relations, or trading dominance by g-value. Moreover, quantitative dominance can also be used to prove that an action starts an optimal plan from a given state. We introduce a novel action selection pruning that uses this to prune any other successor. Results show that quantitative dominance pruning greatly reduces the search space, significantly increasing the planners' performance.
Organizing large scale projects (e.g., Conferences, IT Shows, F1 race) requires precise scheduling of multiple dependent tasks on common resources where multiple selfish entities are competing to execute the individual tasks. In this paper, we consider a well studied and rich scheduling model referred to as RCPSP (Resource Constrained Project Scheduling Problem). The key change to this model that we consider in this paper is the presence of selfish entities competing to perform individual tasks with the aim of maximizing their own utility. Due to the selfish entities in play, the goal of the scheduling problem is no longer only to minimize makespan for the entire project, but rather, to maximize social welfare while ensuring incentive compatibility and economic efficiency. We show that traditional VCG mechanism is not incentive compatible in this context and hence we provide two new practical mechanisms that extend on VCG. These new mechanisms referred to as Individual Completion based Payments (ICP) and Social Completion based Payments (SCP) provide strong theoretical properties including strategy proofness.
We investigate the application of temporal planners to the problem of compiling quantum circuits to emerging quantum hardware. While our approach is general, we focus our initial experiments on Quantum Approximate Optimization Algorithm (QAOA) circuits that have few ordering constraints and thus allow highly parallel plans. We report on experiments using several temporal planners to compile circuits of various sizes to a realistic hardware architecture. This early empirical evaluation suggests that temporal planning is a viable approach to quantum circuit compilation.