| Total: 35
Scheduling is adopted in various domains to assign jobs to resources, such that an objective is optimized. While schedules enable the analysis of the underlying system, publishing them also incurs a privacy risk. Recently, privacy attacks on schedules have been proposed, which may reveal sensitive information on the jobs by solving an inverse scheduling problem. In this work, we study the protection against such attacks. We formulate the problem of privacy-and-utility preservation of schedules, which bounds both, the privacy leakage and the loss in the utility of the schedule due to obfuscation. We address the problem based on a set of perturbation functions for schedules, study their instantiations for standard scheduling problems, and implement privacy-and-utility-aware publishing of a schedule using constraint programming. Experiments with synthetic and real-world schedules demonstrate the feasibility, robustness, and effectiveness of our mechanism.
Partially observable Markov decision processes (POMDPs) form a prominent model for uncertainty in sequential decision making. We are interested in constructing algorithms with theoretical guarantees to determine whether the agent has a strategy ensuring a given specification with probability 1. This well-studied problem is known to be undecidable already for very simple omega-regular objectives, because of the difficulty of reasoning on uncertain events. We introduce a revelation mechanism which restricts information loss by requiring that almost surely the agent has eventually full information of the current state. Our main technical results are to construct exact algorithms for two classes of POMDPs called weakly and strongly revealing. Importantly, the decidable cases reduce to the analysis of a finite belief-support Markov decision process. This yields a conceptually simple and exact algorithm for a large class of POMDPs.
We design sensitivity oracles for error-prone networks. For a network problem Π, the data structure preprocesses a network G=(V,E) and sensitivity parameter f such that, for any set F of up to f link or node failures, it can report the solution of Π in G-F. We study three network problems Π. - L-Hop Shortest Path: Given s,t in V, is there a shortest s-t-path in G-F with at most L links? - k-Path: Does G-F contain a simple path with k links? - k-Clique: Does G-F contain a clique of k nodes? Our main technical contribution is a new construction of (L,f)-replacement path coverings ((L,f)-RPC) in the parameter realm where f = o(log L). An (L,f)-RPC is a family G' of subnetworks of G which, for every set F of at most f links, has a subfamily G'_F such that (i) no subnetwork in G'_F contains a link of F and (ii) for each s,t in V, if G-F contains a shortest s-t-path with at most L links, then some subnetwork in G'_F retains at least one such path. Our (L,f)-RPC has almost the same size as the one by Weimann and Yuster (2013) but it improves the time to query G'_F from Õ(f^2 L^f) to Õ(f^(5/2) L^o(1)). It also improves over the size and query time of the (L,f)-RPC by Karthik and Parter (2021) by nearly a factor of L. From this construction, we derive oracles for L-Hop Shortest Path, k-Path, and k-Clique. Notably, our solution for k-Path improves the query time of the one by Bilò for f=o(log k).
It is well known that numeric planning can be made decidable if the domain of all numeric state variables is finite. This bounded formulation can be polynomially compiled into classical planning with Boolean conditions and conditional effects preserving the plan size exactly. However, it remains unclear whether this compilation has any practical utility. To explore this aspect, this work revisits the theoretical compilation framework from a practical perspective, focusing on the fragment of simple numeric planning. Specifically, we introduce three different compilations. The first, called one-hot, aims to systematise the current practice among planning practitioners of modelling numeric planning through classical planning. The other two, termed binary compilations, extend and specialise the logarithmic encoding introduced in previous literature. Our experimental analysis reveals that the overly complex logarithmic encoding can, surprisingly, be made practical with some representational expedients. Among these, the use of axioms is particularly crucial. Furthermore, we identify a class of mildly numeric planning problems where a classical planner, i.e., LAMA, when run on the compiled problem, is highly competitive with state-of-the-art numeric planners.
We consider temporal numeric planning problems Π expressed in PDDL2.1, and show how it is possible to produce SMT formulas (i) whose models correspond to valid plans of Π, and (ii) which extends the recently proposed planning with patterns approach from the numeric to the temporal case. We prove the correctness and completeness of the approach and that it outperforms all the publicly available temporal planners on 10 domains with required concurrency.
In a temporal graph the edge set dynamically changes over time according to a set of time-labels associated with each edge that indicates at which time-step the edge is available. Two vertices are connected if there is a path connecting them in which the edges are traversed in increasing order of their labels. We study the problem of scheduling the availability time of the edges of a temporal graph in such a way that all pairs of vertices are connected within a given maximum allowed time a and the overall number of labels is minimum. The problem, called Minimum Aged Labeling (MAL), has several applications in logistics, distribution scheduling, and information spreading in social networks, where carefully choosing the time-labels can significantly reduce infrastructure costs, fuel consumption, or greenhouse gases. Problem MAL has previously been proved to be NP-complete on undirected graphs and APX-hard on directed graphs. In this paper, we extend our knowledge on the complexity and approximability of MAL in several directions. We first show that the problem cannot be approximated within a factor better than O(log n) when a >= 2, unless P = NP, and a factor better than 2^[log^(1-ε) n] when a >= 3, unless NP is contained in DTIME(2^(polylog(n))), where n is the number of vertices in the graph. Then we give a set of approximation algorithms that, under some conditions, almost match these lower-bounds. In particular, we show that the approximation depends on a relation between a and the diameter of the input graph. We further establish a connection with a foundational optimization problem on static graphs called Diameter Constrained Spanning Subgraph (DCSS) and show that our hardness results also apply to DCSS.
Logistics and transportation networks require a large amount of resources to realise necessary connections between locations and minimizing these resources is a vital aspect of planning research. Since such networks have dynamic connections that are only available at specific times, intricate models are needed to portray them accurately. In this paper, we study the problem of minimizing the number of resources needed to realise a dynamic network, using the temporal graphs model. In a temporal graph, edges appear at specific points in time. Given a temporal graph and a natural number k, we ask whether we can cover every temporal edge exactly once using at most k temporal journeys; in a temporal journey consecutive edges have to adhere to the order of time. We conduct a thorough investigation of the complexity of the problem with respect to four dimensions: (a) whether the type of the temporal journey is a walk, a trail, or a path; (b) whether the chronological order of edges in the journey is strict or non-strict; (c) whether the temporal graph is directed or undirected; (d) whether the start and end points of each journey are given. We almost completely resolve the complexity of these problems and provide dichotomies for each of them with respect to k.
Despite the widespread success of pattern database (PDB) heuristics in classical planning, to date there has been no application of PDBs to planning with numeric variables. In this paper we attempt to close this gap. We address optimal numeric planning involving conditions characterized by linear expressions and actions that modify numeric variables by constant quantities. Building upon prior research, we present an adaptation of PDB heuristics to numeric planning, introducing several approaches to deal with the unbounded nature of numeric variable projections. These approaches aim to restrict the initially infinite projections, thereby bounding the number of states and ultimately constraining the resulting PDBs. We show that the PDB heuristics obtained with our approach can provide strong guidance for the search.
Assignment problems are a classic combinatorial optimization problem in which a group of agents must be assigned to a group of tasks such that maximum utility is achieved while satisfying assignment constraints. Given the utility of each agent completing each task, polynomial-time algorithms exist to solve a single assignment problem in its simplest form. However, in many modern-day applications such as satellite constellations, power grids, and mobile robot scheduling, assignment problems unfold over time, with the utility for a given assignment depending heavily on the state of the system. We apply multi-agent reinforcement learning to this problem, learning the value of assignments by bootstrapping from the known polynomial-time greedy solver and then learning from further experience. We then choose assignments using a distributed optimal assignment mechanism rather than by selecting them directly. We demonstrate that this algorithm is theoretically justified and avoids pitfalls experienced by other RL algorithms in this setting. Finally, we show that our algorithm significantly outperforms other methods in the literature, even while scaling to realistic scenarios with hundreds of agents and tasks.
The application of graph neural networks (GNNs) to learn heuristic functions in classical planning is gaining traction. Despite the variety of methods proposed in the literature to encode classical planning tasks for GNNs, a comparative study evaluating their relative performances has been lacking. Moreover, some encodings have been assessed solely for their expressiveness rather than practical effectiveness in planning. This paper provides an extensive comparative analysis of existing encodings. Our results indicate that the smallest encoding based on Gaifman graphs, not yet applied in planning, outperforms the rest due to its fast evaluation times and the informativeness of the resulting heuristic. The overall coverage measured on the IPC almost reaches that of the state-of-the-art planner LAMA while exhibiting rather complementary strengths across different domains.
This study investigates scheduling strategies for the stochastic resource-constrained project scheduling problem with maximal time lags (SRCPSP/max). Recent advances in Constraint Programming (CP) and Temporal Networks have re-invoked interest in evaluating the advantages and drawbacks of various proactive and reactive scheduling methods. First, we present a new, CP-based fully proactive method. Second, we show how a reactive approach can be constructed using an online rescheduling procedure. A third contribution is based on partial order schedules and uses Simple Temporal Networks with Uncertainty (STNUs). Our statistical analysis shows that the STNU-based algorithm performs best in terms of solution quality, while also showing good relative offline and online computation time
Large Language Models (LLMs) have shown promise in solving natural language-described planning tasks, but their direct use often leads to inconsistent reasoning and hallucination. While hybrid LLM-symbolic planning pipelines have emerged as a more robust alternative, they typically require extensive expert intervention to refine and validate generated action schemas. It not only limits scalability but also introduces a potential for biased interpretation, as a single expert's interpretation of ambiguous natural language descriptions might not align with the user's actual intent. To address this, we propose a novel approach that constructs an action schema library to generate multiple candidates, accounting for the diverse possible interpretations of natural language descriptions. We further introduce a semantic validation and ranking module that automatically filter and rank these candidates without expert-in-the-loop. The experiments showed our pipeline maintains superiority in planning over the direct LLM planning approach. These findings demonstrate the feasibility of a fully automated end-to-end LLM-symbolic planner that requires no expert intervention, opening up the possibility for a broader audience to engage with AI planning with less prerequisite of domain expertise.
The long-run average payoff per transition (mean payoff) is the main tool for specifying the performance and dependability properties of discrete systems. The problem of constructing a controller (strategy) simultaneously optimizing several mean payoffs has been deeply studied for stochastic and game-theoretic models. One common issue of the constructed controllers is the instability of the mean payoffs, measured by the deviations of the average rewards per transition computed in a finite "window" sliding along a run. Unfortunately, the problem of simultaneously optimizing the mean payoffs under local stability constraints is computationally hard, and the existing works do not provide a practically usable algorithm even for non-stochastic models such as two-player games. In this paper, we design and evaluate the first efficient and scalable solution to this problem applicable to Markov decision processes.
There is an increasing body of work using Large Language Models (LLMs) as agents for orchestrating workflows and making decisions in domains that require planning and multistep reasoning. As a result, it is imperative to evaluate LLMs on core skills required for planning. In this work, we present ACPBench, a benchmark for evaluating the reasoning tasks in the field of planning. The benchmark consists of 7 reasoning tasks over 13 planning domains. The collection is constructed from planning domains described in a formal language. This allows us to synthesize problems with provably correct solutions across many tasks and domains. Further, it allows us the luxury of scale without additional human effort, i.e., many] additional problems can be created automatically. Our extensive evaluation of 21 LLMs and OpenAI o1 reasoning models highlight the significant gap in the reasoning capability of the LLMs. Our findings with OpenAI o1, a multi-turn reasoning model, reveal significant gains in performance on multiple-choice questions, yet surprisingly, no notable progress is made on boolean questions.
Constrained Markov decision processes (CMDPs), in which the agent optimizes expected payoffs while keeping the expected cost below a given threshold, are the leading framework for safe sequential decision making under stochastic uncertainty. Among algorithms for planning and learning in CMDPs, methods based on Monte Carlo tree search (MCTS) have particular importance due to their efficiency and extendibility to more complex frameworks (such as partially observable settings and games). However, current MCTS-based methods for CMDPs either struggle with finding safe (i.e., constraint-satisfying) policies, or are too conservative and do not find valuable policies. We introduce Threshold UCT (T-UCT), an online MCTS-based algorithm for CMDP planning. Unlike previous MCTS-based CMDP planners, T-UCT explicitly estimates Pareto curves of cost-utility trade-offs throughout the search tree, using these together with a novel action selection and threshold update rule to seek safe and valuable policies. Our experiments demonstrate that our approach significantly outperforms state-of-the-art methods from the literature.
One of the major techniques to tackle temporal planning problems is heuristic search augmented with a symbolic representation of time in the states. Augmenting the problem with composite actions (macro-actions) is a simple and powerful approach to create "shortcuts" in the search space, at the cost of augmenting the branching factor of the problem and thus the expansion time of a heuristic search planner. Hence, it is of paramount importance to select the right macro-actions and minimize the number of such actions to optimize the planner performance. In this paper, we first discuss a simple, yet powerful, model similar to macro-actions for the case of temporal planning, and we call these macro-events. Then, we present a novel ranking function to extract and select a suitable set of macro-events from a dataset of valid plans. In our ranking approach, we consider an estimation of the hypothetical search space for a blind search including a candidate set of macro-events under four different exploitation schemata. Finally, we experimentally demonstrate that the proposed approach yields a substantial performance improvement for a state-of-the-art temporal planner.
Formulating a real-world problem under the Reinforcement Learning framework involves non-trivial design choices, such as selecting a discount factor for the learning objective (dis- counted cumulative rewards), which articulates the planning horizon of the agent. This work investigates the impact of the discount factor on the bias-variance trade-off given structural parameters of the underlying Markov Decision Process. Our results support the idea that a shorter planning horizon might be beneficial, especially under partial observability.
Hardness of modeling a planning domain is a major obstacle for making automated planning techniques accessible. We developed a tool that helps modelers correct domains based on available information such as the known feasibility or infeasibility of certain plans. Designing model repair strategies that are capable of repairing flawed planning domains automatically has been explored in previous work to use positive plans (invalid in the given (flawed) domain but feasible in the ``true'' domain). In this work, we highlight the importance of and study counter-example negative plans (valid in the given (flawed) domain but infeasible in the ``true'' domain). Our approach automatically corrects a domain by finding an optimal repair set to the domain which turns all negative plans into non-solutions, in addition to making all positive plans solutions. Experiments indicate strong performance in the fast-downward benchmark suite with random errors. A handcrafted benchmark with domain flaws inspired by some practical applications also motivates the method's efficacy.
We consider infinite-horizon Markov Decision Processes where parameters, such as transition probabilities, are unknown and estimated from data. The popular distributionally robust approach to addressing the parameter uncertainty can sometimes be overly conservative. In this paper, we utilize the recently proposed formulation, Bayesian risk Markov Decision Process (BR-MDP), to address parameter (or epistemic) uncertainty in MDPs. To solve the infinite-horizon BR-MDP with a class of convex risk measures, we propose a computationally efficient approach called approximate bilevel difference convex programming (ABDCP). The optimization is performed offline and produces the optimal policy that is represented as a finite state controller with desirable performance guarantees. We also demonstrate the empirical performance of the BR-MDP formulation and the proposed algorithm.
The rapid advancement of autonomous web navigation has significantly benefited from grounding pretrained Large Language Models (LLMs) as agents. However, current research has yet to fully leverage the redundancy of HTML elements for contrastive training. This paper introduces a novel approach to LLM-based web navigation tasks, called Web Element Preference Optimization (WEPO). WEPO utilizes unsupervised preference learning by sampling distance-based non-salient web elements as negative samples, optimizing maximum likelihood objective within Direct Preference Optimization (DPO). We evaluate WEPO on the Mind2Web benchmark and empirically demonstrate that WEPO aligns user high-level intent with output actions more effectively. The results show that our method achieved the state-of-the-art, with an improvement of 13.8% over WebAgent and 5.3% over the visual language model CogAgent baseline. Our findings underscore the potential of preference optimization to enhance web navigation and other web page based tasks, suggesting a promising direction for future research.
Despite significant recent advances in the field of Deep Reinforcement Learning (DRL), such methods typically incur high cost of training to learn effective policies, thus posing cost and safety challenges in many practical applications. To improve the learning efficiency of (D)RL methods, transfer learning (TL) has emerged as a promising approach to leverage prior experience on a source domain to speed learning on a new, but related, target domain. In this paper, we take a novel model-informed approach to TL in DRL by assuming that we have knowledge of both the source and target domain models (which would be the case in the prevalent setting of DRL with simulators). While directly solving either the source or target MDP via solution methods like value iteration is computationally prohibitive, we exploit the fact that if the target and source MDPs differ only due to a small structural change in their rewards, we can apply structured value iteration methods in a procedure we term ModelDiff to solve the much smaller target-source ``Diff'' MDP for a reasonable horizon. This ModelDiff approach can then be integrated into extensions of standard DRL algorithms like ModelDiff (MD) DQN, where it provides enhanced provable lower bound guidance to DQN that often speeds convergence for the positive transfer case while critically avoiding decelerated learning in the negative transfer case. Experiments show that MD-DQN matches or outperforms existing TL methods and baselines in both positive and negative transfer settings.
Markov decision processes (MDP) are a well-established model for sequential decision-making in the presence of probabilities. In *robust* MDP (RMDP), every action is associated with an *uncertainty set* of probability distributions, modelling that transition probabilities are not known precisely. Based on the known theoretical connection to stochastic games, we provide a framework for solving RMDPs that is generic, reliable, and efficient. It is *generic* both with respect to the model, allowing for a wide range of uncertainty sets, including but not limited to intervals, L1- or L2-balls, and polytopes; and with respect to the objective, including long-run average reward, undiscounted total reward, and stochastic shortest path. It is *reliable*, as our approach not only converges in the limit, but provides precision guarantees at any time during the computation. It is *efficient* because -- in contrast to state-of-the-art approaches -- it avoids explicitly constructing the underlying stochastic game. Consequently, our prototype implementation outperforms existing tools by several orders of magnitude and can solve RMDPs with a million states in under a minute.
Language-conditioned robotic manipulation in unstructured environments presents significant challenges for intelligent robotic systems. However, due to partial observation or imprecise action prediction, failure may be unavoidable for learned policies. Moreover, operational failures can lead to the robotic arm entering an untrained state, potentially causing destructive results. Consequently, the ability to detect and self-correct failures is crucial for the development of practical robotic systems. To address this challenge, we propose a foresight-driven failure detection and self-correction module for robot manipulation. By leveraging 3D Gaussian Splatting, we represent the current scene with multiple Gaussians. Subsequently, we train a prediction network to forecast the Gaussian representation of future scenes conditioned on planned actions. Failure is detected when the predicted future significantly deviates from the real observation after action execution. In such cases, the end-effector rolls back to the previous action to avoid an untrained state. Integrating this approach with the PerACT framework, we develop a self-correcting robot manipulation policy. Evaluations on ten RLBench tasks with 166 variations demonstrate the superior performance of the proposed method, which outperforms state-of-the-art methods by 12.0% success rate on average.
Monte-Carlo Tree Search (MCTS) is a popular approach to online planning under uncertainty. While MCTS uses statistical sampling via multi-armed bandits to avoid exhaustive search in complex domains, common closed-loop approaches typically construct enormous search trees to consider a large number of potential observations and actions. On the other hand, open-loop approaches offer better memory efficiency by ignoring observations but are generally not competitive with closed-loop MCTS in terms of performance - even with commonly integrated human knowledge. In this paper, we propose Counterfactual Open-loop Reasoning with Ad hoc Learning (CORAL) for open-loop MCTS, using a causal multi-armed bandit approach with unobserved confounders (MABUC). CORAL consists of two online learning phases that are conducted during the open-loop search. In the first phase, observational values are learned based on preferred actions. In the second phase, counterfactual values are learned with MABUCs to make a decision via an intent policy obtained from the observational values. We evaluate CORAL in four POMDP benchmark scenarios and compare it with closed-loop and open-loop alternatives. In contrast to standard open-loop MCTS, CORAL achieves competitive performance compared with closed-loop algorithms while constructing significantly smaller search trees.
We present a methodology based on interactive theorem proving that facilitates the development of verified implementations of algorithms for solving factored Markov Decision Processes. As a case study, we formally verify an algorithm for approximate policy iteration in the proof assistant Isabelle/HOL. We show how the verified algorithm can be refined to an executable, verified implementation. Our evaluation on benchmark problems shows that it is practical. As part of the development, we build verified software to certify linear programming solutions. We discuss the verification process and the modifications we made to the algorithm during formalization.