AAAI.2026 - Planning, Routing, and Scheduling | Cool Papers

#1 Best-Effort Policies for Robust Markov Decision Processes [PDF] [Copy] [Kimi] [REL]

Authors: Alessandro Abate, Thom Badings, Giuseppe De Giacomo, Francesco Fabiano

We study the common generalization of Markov decision processes (MDPs) with sets of transition probabilities, known as robust MDPs (RMDPs). A standard goal in RMDPs is to compute a policy that maximizes the expected return under an adversarial choice of the transition probabilities. If the uncertainty in the probabilities is independent between the states, known as s-rectangularity, such optimal robust policies can be computed efficiently using robust value iteration. However, there might still be multiple optimal robust policies, which, while equivalent with respect to the worst-case, reflect different expected returns under non-adversarial choices of the transition probabilities. Hence, we propose a refined policy selection criterion for RMDPs, drawing inspiration from the notions of dominance and best-effort in game theory. Instead of seeking a policy that only maximizes the worst-case expected return, we additionally require the policy to achieve a maximal expected return under different (i.e., not fully adversarial) transition probabilities. We call such a policy an optimal robust best-effort (ORBE) policy. We prove that ORBE policies always exist, characterize their structure, and present an algorithm to compute them with a manageable overhead over standard robust value iteration. ORBE policies offer a principled tie-breaker among optimal robust policies. Numerical experiments show the feasibility of our approach.

Subject: AAAI.2026 - Planning, Routing, and Scheduling

#2 A Switching Framework for Online Interval Scheduling with Predictions [PDF] [Copy] [Kimi] [REL]

Authors: Antonios Antoniadis, Ali Shahheidar, Golnoosh Shahkarami, Abolfazl Soltani

We study online interval scheduling in the irrevocable setting, where each interval must be immediately accepted or rejected upon arrival. The objective is to maximize the total length of accepted intervals while ensuring that no two accepted intervals overlap. We consider this problem in a learning-augmented setting, where the algorithm has access to (machine-learned) predictions. The goal is to design algorithms that leverage these predictions to improve performance while maintaining robust guarantees in the presence of prediction errors. Our main contribution is the SemiTrust-and-Switch framework, which provides a unified approach for combining prediction-based and classical interval scheduling algorithms. This framework applies to both deterministic and randomized algorithms and captures the trade-off between consistency (performance under accurate predictions) and robustness (performance under adversarial inputs). Moreover, we provide lower bounds, proving the tightness of this framework in particular settings. We further design a randomized algorithm that smoothly interpolates between prediction-based and robust algorithms. This algorithm achieves both robustness and smoothness-its performance degrades gracefully with the quality of the prediction.

Subject: AAAI.2026 - Planning, Routing, and Scheduling

#3 Qualitative Analysis of ω-Regular Objectives on Robust MDPs [PDF] [Copy] [Kimi] [REL]

Authors: Ali Asadi, Krishnendu Chatterjee, Ehsan Kafshdar Goharshady, Mehrdad Karrabi, Ali Shafiee

Robust Markov Decision Processes (RMDPs) generalize classical MDPs that consider uncertainties in transition probabilities by defining a set of possible transition functions. An objective is a set of runs (or infinite trajectories) of the RMDP, and the value for an objective is the maximal probability that the agent can guarantee against the adversarial environment. We consider (a) reachability objectives, where given a target set of states, the goal is to eventually arrive at one of them; and (b) parity objectives, which are a canonical representation for ω-regular objectives. The qualitative analysis problem asks whether the objective can be ensured with probability 1. In this work, we study the qualitative problem for reachability and parity objectives on RMDPs without making any assumption over the structures of the RMDPs, e.g., unichain or aperiodic. Our contributions are twofold. We first present efficient algorithms with oracle access to uncertainty sets that solve qualitative problems of reachability and parity objectives. We then report experimental results demonstrating the effectiveness of our oracle-based approach on classical RMDP examples from the literature scaling up to thousands of states.

Subject: AAAI.2026 - Planning, Routing, and Scheduling

#4 Revealing POMDPs: Qualitative and Quantitative Analysis for Parity Objectives [PDF] [Copy] [Kimi] [REL]

Authors: Ali Asadi, Krishnendu Chatterjee, David Lurie, Raimundo Saona

Partially observable Markov decision processes (POMDPs) are a central model for uncertainty in sequential decision making. The most basic objective is the reachability objective, where a target set must be eventually visited, and the more general parity objectives can model all omega-regular specifications. For such objectives, the computational analysis problems are the following: (a) qualitative analysis that asks whether the objective can be satisfied with probability 1 (almost-sure winning) or probability arbitrarily close to 1 (limit-sure winning); and (b) quantitative analysis that asks for the approximation of the optimal probability of satisfying the objective. For general POMDPs, almost-sure analysis for reachability objectives is EXPTIME-complete, but limit-sure and quantitative analyses for reachability objectives are undecidable; almost-sure, limit-sure, and quantitative analyses for parity objectives are all undecidable. A special class of POMDPs, called revealing POMDPs, has been studied recently in several works, and for this subclass the almost-sure analysis for parity objectives was shown to be EXPTIME-complete. In this work, we show that for revealing POMDPs the limit-sure analysis for parity objectives is EXPTIME-complete, and even the quantitative analysis for parity objectives can be achieved in EXPTIME.

Subject: AAAI.2026 - Planning, Routing, and Scheduling

#5 Bilevel MCTS for Amortized O(1) Node Selection in Classical Planning [PDF] [Copy] [Kimi] [REL]

Author: Masataro Asai

We study an efficient implementation of Multi-Armed Bandit (MAB)-based Monte-Carlo Tree Search (MCTS) for classical planning. One weakness of MCTS is that it spends a significant time in deciding which node to expand next. While selecting a node from an OPEN list with N nodes has O(1) runtime complexity with traditional array-based priority-queues for dense integer keys, the tree-based OPEN list used by MCTS requires O(log N), which roughly corresponds to the search depth d. In classical planning, d is arbitrarily large (e.g., 2^k-1 in k-disk Tower-of-Hanoi) and the runtime for node selection is significant, unlike in game tree search, where the cost is negligible compared to the node evaluation (rollouts) because d is inherently limited by the game (e.g. d≦361 in Go). To improve this bottleneck, we propose a bilevel modification to MCTS that runs a best-first search from each selected leaf nodes with an expansion budget proportional to d, which achieves amortized O(1) runtime for node selection, equivalent to traditional queue-based OPEN list. In addition, we introduce Tree Collapsing, an enhancement that reduces action selection steps and further improves the performance.

Subject: AAAI.2026 - Planning, Routing, and Scheduling

#6 Extreme Value Monte Carlo Tree Search for Classical Planning [PDF] [Copy] [Kimi] [REL]

Authors: Masataro Asai, Stephen Wissow

Despite being successful in board games and reinforcement learning (RL), Monte Carlo Tree Search (MCTS) combined with Multi Armed Bandit (MAB) has seen limited success in domain-independent classical planning until recently. Previous work (Wissow and Asai, 2024) showed that UCB1, designed for bounded rewards, does not perform well as applied to cost-to-go estimates in classical planning, because cost-to-go estimates are unbounded, and showed improved performance using a Gaussian reward MAB instead. This paper further sharpens our understanding of ideal bandits for planning tasks. Existing work has two issues: first, Gaussian MABs under-specify the support of cost-to-go estimates as (-∞, ∞), which we can narrow down. Second, Full Bellman backup (Schulte and Keller, 2014) that backpropagates sample max/min lacks theoretical justification. We use Peaks-Over-Threashold Extreme Value Theory to resolve both issues at once, propose a new bandit algorithm (UCB1-Uniform). We formally prove its regret bound and empirically demonstrate its performance in classical planning.

Subject: AAAI.2026 - Planning, Routing, and Scheduling

#7 Learning Heuristic Functions with Graph Neural Networks for Numeric Planning [PDF] [Copy] [Kimi] [REL]

Authors: Valerio Borelli, Alfonso Gerevini, Enrico Scala, Ivan Serina

In this paper, we investigate the application of heuristics based on Graph Neural Networks (GNNs) to lifted numeric planning problems, an area that has been relatively unexplored. Building upon the GNN approach for learning general policies proposed by Ståhlberg, Bonet, and Geffner (2022b), we extend the architecture to make it sensitive to the numeric components inherent in the planning problems we address. We achieve this by observing that, although the state space of a numeric planning problem is infinite, the finite subgoal structure of the problem can be incorporated into the architecture, enabling the construction of a finite structure. Instead of learning general policies, we train our models to serve as heuristics within a best-first search algorithm. We explore various configurations of this architecture and demonstrate that the resulting heuristics are highly informative and, in certain domains, offer a better trade-off between guidance and computational cost compared to state-of-the-art heuristics.

Subject: AAAI.2026 - Planning, Routing, and Scheduling

#8 CABTO: Context-Aware Behavior Tree Grounding for Robot Manipulation [PDF] [Copy] [Kimi] [REL]

Authors: Yishuai Cai, Xinglin Chen, Yunxin Mao, Kun Hu, Minglong Li

Behavior Trees (BTs) offer a powerful paradigm for designing modular and reactive robot controllers. BT planning, an emerging field, provides theoretical guarantees for the automated generation of reliable BTs. However, BT planning typically assumes that a well-designed BT system is already grounded—comprising high-level action models and low-level control policies—which often requires extensive expert knowledge and manual effort. In this paper, we formalize the BT Grounding problem: the automated construction of a complete and consistent BT system. We analyze its complexity and introduce CABTO (Context-Aware Behavior Tree grOunding), the first framework to efficiently solve this challenge. CABTO leverages pre-trained Large Models (LMs) to heuristically search the space of action models and control policies, guided by contextual feedback from BT planners and environmental observations. Experiments spanning seven task sets across three distinct robotic manipulation scenarios demonstrate CABTO’s effectiveness and efficiency in generating complete and consistent behavior tree systems.

Subject: AAAI.2026 - Planning, Routing, and Scheduling

#9 Probabilistic Hierarchical Goal Network Planning with UCT [PDF] [Copy] [Kimi] [REL]

Authors: David H. Chan, Mark Roberts, Dana S. Nau

Hierarchical goal networks (HGNs) provide a framework for goal-directed planning by decomposing high-level goals into ordered subgoals. While prior work has examined non-determinism for hierarchical planning (specifically, HTNs), scant work studies how HGNs can help in stochastic settings. We introduce a formalism for probabilistic HGN planning with action-insertion semantics, enabling probabilistic planners to incorporate domain knowledge from goal decomposition methods. We design and evaluate two UCT-based algorithms for solving probabilistic HGN planning problems: an asymptotically optimal approach and a compressed, shared-value approach that optimizes separately for each goal within the goal-subgoal hierarchy. We compare our two UCT-based HGN search algorithms experimentally on modified benchmark domains from the FOND HTN literature. Our results demonstrate that on larger problems, the compressed search converges more quickly and outperforms the asymptotically optimal search. This suggests that HGNs can be effective in probabilistic planning, and compression may yield better performance on large problems in anytime settings with stochastic action outcomes.

Subject: AAAI.2026 - Planning, Routing, and Scheduling

#10 Satisficing and Optimal Generalised Planning via Goal Regression [PDF] [Copy] [Kimi] [REL]

Authors: Dillon Z. Chen, Till Hofmann, Toryn Q. Klassen, Sheila A. McIlraith

Generalised planning (GP) refers to the task of synthesising programs that solve families of related planning problems. We introduce a novel, yet simple method for GP: given a set of training problems, for each problem, compute an optimal plan for each goal atom in some order, perform goal regression on the resulting plans, and lift the corresponding outputs to obtain a set of first-order Condition → Actions rules. The rules collectively constitute a generalised plan that can be executed as is or alternatively be used to prune the planning search space. We formalise and prove the conditions under which our method is guaranteed to learn valid generalised plans and state space pruning axioms for search. Experiments demonstrate significant improvements over state-of-the-art (generalised) planners with respect to the 3 metrics of synthesis cost, planning coverage, and solution quality on various classical and numeric planning domains.

Subject: AAAI.2026 - Planning, Routing, and Scheduling

#11 A Domain-specific Heuristic for PDDL+-based Traffic Signal Optimisation [PDF] [Copy] [Kimi] [REL]

Authors: Francesco Doria, Francesco Percassi, Marco Maratea, Mauro Vallati

Optimising traffic signals is crucial for mitigating urban congestion, and automated planning, particularly with PDDL+, has shown promise for real-world deployment due to its flexibility and centralised perspective. While existing PDDL+ models guarantee deployability on current infrastructure, they face significant limitations: reliance on domain-independent heuristics restricts their applicability and scalability, leading to slow solution generation and unclear plan quality. To overcome these challenges and unlock the widespread adoption of planning-based traffic control, we introduce hCAFE, a domain-specific heuristic for PDDL+-based traffic signal optimisation. Unlike prior approaches, hCAFE is designed to work effectively across multiple problem encodings, addressing a key limitation of traditional domain-specific heuristics. We demonstrate its capabilities on real-world data from a region of the UK, showing significant improvements in solution generation time and search space exploration. Our evaluation also compares the strategies generated by hCAFE against historical data from existing traffic control systems and a non-deployable benchmark, confirming the high quality of the resulting plans.

Subject: AAAI.2026 - Planning, Routing, and Scheduling

#12 Universal Safety Controllers with Learned Prophecies [PDF] [Copy] [Kimi] [REL]

Authors: Bernd Finkbeiner, Niklas Metzger, Satya Prakash Nayak, Anne-Kathrin Schmuck

Universal Safety Controllers (USCs) are a promising logical control framework that guarantees the satisfaction of a given temporal safety specification when applied to any realizable plant model. Unlike traditional methods, which synthesize one logical controller over a given detailed plant model, USC synthesis constructs a generic controller whose outputs are conditioned by plant behavior, called prophecies. Thereby, USCs offer strong generalization and scalability benefits over classical logical controllers. However, the exact computation and verification of prophecies remain computationally challenging. In this paper, we introduce an approximation algorithm for USC synthesis that addresses these limitations via learning. Instead of computing exact prophecies, which reason about sets of trees via automata, we only compute under- and over-approximations from (small) example plants and infer computation tree logic (CTL) formulas as representations of prophecies. The resulting USC generalizes to unseen plants via a verification step and offers improved efficiency and explainability through small and concise CTL prophecies, which remain human-readable and interpretable. Experimental results demonstrate that our learned prophecies remain generalizable, yet are significantly more compact and interpretable than their exact tree automata representations.

Subject: AAAI.2026 - Planning, Routing, and Scheduling

#13 Managing Infinite Abstractions in Numeric Pattern Database Heuristics [PDF] [Copy] [Kimi] [REL]

Authors: Markus Fritzsche, Daniel Gnad, Mikhail Gruntov, Alexander Shleyfman

Pattern Database (PDB) heuristics are an established approach in optimal classical planning that is used in state-of-the-art planning systems. PDBs are based on projections, which induce an abstraction of the original problem. Computing all cheapest plans in the abstraction yields an admissible heuristic. Despite their success, PDBs have only recently been adapted to numeric planning, which extends classical planning with numeric state variables. The difficulty in supporting numeric variables is that the induced abstractions, in contrast to classical planning, are generally infinite. Thus, they cannot be explored exhaustively to compute a heuristic. The foundational work that introduced numeric PDBs employed a simple approach that computes only a finite part of the abstraction. We analyze this framework and identify cases where it necessarily results in an uninformed heuristic. We propose several improvements over the basic variant of numeric PDBs that lead to enhanced heuristic accuracy.

Subject: AAAI.2026 - Planning, Routing, and Scheduling

#14 Symmetry-Aware Transformer Training for Automated Planning [PDF] [Copy] [Kimi] [REL]

Authors: Markus Fritzsche, Elliot Gestrin, Jendrik Seipp

While transformers excel in many settings, their application in the field of automated planning is limited. Prior work like PlanGPT, a state-of-the-art decoder-only transformer, struggles with extrapolation from easy to hard planning problems. This in turn stems from problem symmetries: planning tasks can be represented with arbitrary variable names that carry no meaning beyond being identifiers. This causes a combinatorial explosion of equivalent representations that pure transformers cannot efficiently learn from. We propose a novel contrastive learning objective to make transformers symmetry-aware and thereby compensate for their lack of inductive bias. Combining this with architectural improvements, we show that transformers can be efficiently trained for either plan-generation or heuristic-prediction. Our results across multiple planning domains demonstrate that our symmetry-aware training effectively and efficiently addresses the limitations of PlanGPT.

Subject: AAAI.2026 - Planning, Routing, and Scheduling

#15 Robust Out-of-Order Retrieval for Grid-Based Storage at Maximum Capacity [PDF] [Copy] [Kimi] [REL]

Authors: Tzvika Geft, William Zhang, Jingjin Yu, Kostas Bekris

This paper proposes a framework for improving the operational efficiency of automated storage systems under uncertainty. It considers a 2D grid-based storage for uniform-sized loads (e.g., containers, pallets, or totes), which are moved by a robot (or other manipulator) along a collision-free path in the grid. The loads are labeled (i.e., unique) and must be stored in a given sequence, and later be retrieved in a different sequence---an operational pattern that arises in logistics applications, such as last-mile distribution centers and shipyards. The objective is to minimize the load relocations to ensure efficient retrieval. A previous result guarantees a zero-relocation solution for known storage and retrieval sequences, even for storage at full capacity, provided that the side of the grid through which loads are stored/retrieved is at least 3 cells wide. However, in practice, the retrieval sequence can change after the storage phase. To address such uncertainty, this work investigates k-bounded perturbations during retrieval, under which any two loads may depart out of order if they are originally at most k positions apart. We prove that a Theta(k) grid width is necessary and sufficient for eliminating relocations at maximum capacity. We also provide an efficient solver for computing a storage arrangement that is robust to such perturbations. To address the higher-uncertainty case where perturbations exceed k, a strategy is introduced to effectively minimize relocations. Extensive experiments show that, for k up to half the grid width, the proposed storage-retrieval framework essentially eliminates relocations. For k values up to the full grid width, relocations are reduced by 50%+.

Subject: AAAI.2026 - Planning, Routing, and Scheduling

#16 Constrained and Robust Policy Synthesis with Satisfiability-Modulo-Probabilistic-Model-Checking [PDF] [Copy] [Kimi] [REL]

Authors: Linus Heck, Filip Macák, Milan Češka, Sebastian Junges

The ability to compute reward-optimal policies for given and known finite Markov decision processes (MDPs) underpins a variety of applications across planning, controller synthesis, and verification. However, we often want policies (1) to be robust, i.e., they perform well on perturbations of the MDP and (2) to satisfy additional structural constraints regarding, e.g., their representation or implementation cost. Computing such robust and constrained policies is indeed computationally more challenging. This paper contributes the first approach to effectively compute robust policies subject to arbitrary structural constraints using a flexible and efficient framework. We achieve flexibility by allowing to express our constraints in a first-order theory over a set of MDPs, while the root for our efficiency lies in the tight integration of satisfiability solvers to handle the combinatorial nature of the problem and probabilistic model checking algorithms to handle the analysis of MDPs. Experiments on a few hundred benchmarks demonstrate the feasibility for constrained and robust policy synthesis and the competitiveness with state-of-the-art methods for various fragments of the problem.

Subject: AAAI.2026 - Planning, Routing, and Scheduling

#17 Learning Heuristic Functions for HTN Planning [PDF] [Copy] [Kimi] [REL]

Author: Daniel Höller

In recent years, ML-based heuristic functions for automated planning have shown increasing performance. A main challenge is the level of generalization required in planning: techniques must generalize at least across different instances of the same domain (which results in different sizes of learning input). A common approach to overcome the issue is to use graph representations as input. While GNNs are a natural choice for learning, other methods have recently been favored because they show better runtime performance and need less training data. However, existing work has so far been limited to non-hierarchical planning. We describe the first approach to learn heuristics for hierarchical planning. We extend the Instance Learning Graph – a graph structure used in non-hierarchical planning – to the new setting and show how to learn heuristic functions based on it. Since our heuristics are applicable to the lifted model, there is no need to ground it. We therefore combine it with a novel lifted HTN planning system. Like recent systems in non-hierarchical planning, it grounds the search space explored so far, but not the entire model prior to search. Our evaluation shows that our approach is competitive with the lifted systems from the literature, though the ground systems achieve higher coverage.

Subject: AAAI.2026 - Planning, Routing, and Scheduling

#18 Experiential Fairness: Bridging the Gap Between User Experience and Resource-Centric Fairness in Online LLM Services [PDF] [Copy] [Kimi] [REL]

Authors: Jiahua Huang, Wentai Wu, Yongheng Liu, Guozhi Liu, Yang Wang, Weiwei Lin

Conventional fairness in multi-tenant Large Language Model (LLM) inference services is typically defined by system-centric metrics such as equitable resource allocation. We argue that this is unilateral and it creates a gap between measured system performance and actual user-perceived quality. We challenge this notion by introducing and formalizing Experiential Fairness, a user-centric paradigm that shifts the objective from equality of opportunity (resource access) to equity of outcome (user experience). With this motivation we propose ExFairS, a lightweight scheduling framework that perceives each user's satisfaction as a composite measure of Service Level Objective (SLO) compliance and resource consumption, and dynamically re-orders the serving queue guided by a credit-based priority mechanism. Extensive experiments on an 8-GPU NVIDIA V100 node show that ExFairS reduces the SLO violation rate by up to 100% and improves system throughput by 14-21.9%, outperforming state-of-the-art schedulers and delivering a demonstrably higher degree of Experiential Fairness.

Subject: AAAI.2026 - Planning, Routing, and Scheduling

#19 Hierarchical Reinforcement Learning with Topology-Aware Exploration Framework for Multi-path Commodity Flow Problem [PDF] [Copy] [Kimi] [REL]

Authors: Jingchen Jiang, Xuan Zhou, Jiayuan Li, Geng Han, Xiang Shi, Fang Deng

The multi-path commodity flow problem (MPCFP) is crucial for ensuring reliable and high-speed data transmission in communication networks. However, existing studies that employ pre-generated routing paths neglect real-time load state and the coupling among decisions, thus hindering the achievement of high-quality solutions. To overcome this, we propose Hierarchical Reinforcement Learning with Topology-Aware Exploration (HRL-TAE), which is the first fully end-to-end framework that dynamically produces high-quality solutions based on real-time network states. HRL-TAE integrates an exploration mechanism and utilizes the State Transition Guiding List (STGL) to guide state transitions, thereby transforming topology exploration into a Markov decision process. Guided by STGL, two closely coupled layers in HRL-TAE, that is, the path construct layer and the ratio allocate layer, construct multiple subpaths for each flow and allocate traffic ratios among them. Subsequently, adaptive constraint-driven masks exclude infeasible actions during decision making, thereby guaranteeing that all constraints are satisfied. We also adopt a tailored training approach to obtain accurate gradient estimates and improve training efficiency. Simulations and real-world experiments demonstrate that HRL-TAE achieves superior performance.

Subject: AAAI.2026 - Planning, Routing, and Scheduling

#20 Boosting Cross-problem Generalization in Diffusion-Based Neural Combinatorial Solver via Inference Time Adaptation [PDF] [Copy] [Kimi] [REL]

Authors: Haoyu Lei, Kaiwen Zhou, Yinchuan Li, Zhitang Chen, Farzan Farnia

Diffusion-based Neural Combinatorial Optimization (NCO) has demonstrated effectiveness in solving NP-complete (NPC) problems by learning discrete diffusion models for solution generation, eliminating hand-crafted domain knowledge. Despite their success, existing NCO methods face significant challenges in both cross-scale and cross-problem generalization, and high training costs compared to traditional solvers. While recent studies on diffusion models have introduced training-free guidance approaches that leverage pre-defined guidance functions for conditional generation, such methodologies have not been extensively explored in combinatorial optimization. To bridge this gap, we propose a training-free inference time adaptation framework (DIFU-Ada) that enables both the zero-shot cross-problem transfer and cross-scale generalization capabilities of diffusion-based NCO solvers without requiring additional training. We provide theoretical analysis that helps understanding the cross-problem transfer capability. Our experimental results demonstrate that a diffusion solver, trained exclusively on the Traveling Salesman Problem (TSP), can achieve competitive zero-shot transfer performance across different problem scales on TSP variants, such as Prize Collecting TSP (PCTSP) and the Orienteering Problem (OP), through inference time adaptation.

Subject: AAAI.2026 - Planning, Routing, and Scheduling

#21 Elite Pattern Reinforcement for Vehicle Routing Problems [PDF] [Copy] [Kimi] [REL]

Authors: Ning Li, Peng Lin, Peng Zhang, Ruichen Tian

Machine learning methods have been increasingly applied to solve Vehicle Routing Problems (VRPs). A high-efficiency approach is to learn solution construction using deep neural networks. However, their tendency toward premature convergence is a critical barrier, severely hindering generalization across diverse distributions and scales. To overcome this, we introduce Elite-Pattern Reinforcement (EPR), a novel strategy designed to create a synergy between the diverse, exploratory nature of reinforcement learning and the high-quality, structured knowledge from classical heuristics. The strategy guides the learning process by reinforcing structural patterns from elite solutions, employing an elite-guided score modulation to integrate this external knowledge. The inherent symmetry of path patterns is also exploited to augment the structural information. This steers the policy away from premature convergence by enabling it to distinguish and favour elite path patterns over inferior ones. Integrating our strategy with four construction methods yields substantial performance improvements on the CVRPLIB and TSPLIB benchmarks. Furthermore, our approach outperforms state-of-the-art learning-based methods, demonstrating superior generalization across diverse distributions and scales.

Subject: AAAI.2026 - Planning, Routing, and Scheduling

#22 Diverse Human Driving Vehicle Simulation in Background Traffic for Autonomous Driving Tests [PDF] [Copy] [Kimi] [REL]

Authors: Wendi Li, Hao Wu, Han Gao, Bing Mao, Fengyuan Xu, Sheng Zhong

Realistic background traffic is critical to the simulation platforms for autonomous driving (AD) testing. Given that most vehicles in reality are driven by human beings, introducing human driving (HD) vehicles to the background traffic is necessary to be able to discover more problems of the tested AD vehicle in the simulation stage. However, existing methods rely on ad-hoc rules or data-driven training to mimic partial human driver behaviors, which are not comprehensive and lack transparency. In this work, we design a smart human driving vehicle simulator HDSim which is empowered by cognitively inspired modeling and AI models. HDSim enables diverse, realistic, and scalable HD traffic simulation on AD testing platforms like CARLA in a non-intrusive manner. There are two novel components in HDSim. First, we introduce a driver model to guide the generation of diverse human driving styles by using different combinations of latent cognitive factors in a hierarchy. Second, we design a Perception-Mediated Behavior Influence (PMBI) mechanism to use LLM-assisted perceptual transformations to indirectly fuse driving actions with driving styles. Experiments show that HDSim traffic can help simulation platforms like CARLA to reveal 68% more failures of tested AD vehicles, and the explainability of reported accidents is also improved.

Subject: AAAI.2026 - Planning, Routing, and Scheduling

#23 Automated Repair of Totally-Ordered Hierarchical Task Network Domains via Context-Free Grammars with Large Language Model Support [PDF] [Copy] [Kimi] [REL]

Authors: Daniel Lutalo, Pascal Bercher

Repairing flawed domain models remains a critical challenge in AI planning, with few effective techniques available. We propose a novel approach for repairing totally ordered hierarchical task network (TO-HTN) models with missing actions, guided by a plan that must be valid for the repaired model. This problem has only one previously documented approach, which relies on complex re-encoding that's solved via TO-HTN planning. In contrast, our approach translates the repair task into a context-free grammar repair problem and leverages a large language model (LLM) to identify and insert relevant actions directly, simplifying the repair process. We evaluate our approach on established benchmarks and demonstrate substantially improved results over the prior approach, achieving nearly three times the number of instances solved, and nearly solving all instances of domains in which the previous approach solved zero. Importantly, we mask all natural language hints, such as action names, forcing the LLM to simulate reasoning and planning, and mitigating the risk of data leakage from its training corpus.

Subject: AAAI.2026 - Planning, Routing, and Scheduling

#24 Two Constraint Compilation Methods for Lifted Planning [PDF] [Copy] [Kimi] [REL]

Authors: Periklis Mantenoglou, Luigi Bonassi, Enrico Scala, Pedro Zuidberg Dos Martires

We study planning in a fragment of PDDL with qualitative state-trajectory constraints, capturing safety requirements, task ordering conditions, and intermediate sub-goals commonly found in real-world problems. A prominent approach to tackle such problems is to compile their constraints away, leading to a problem that is supported by state-of-the-art planners. Unfortunately, existing compilers do not scale on problems with a large number of objects and high-arity actions, as they necessitate grounding the problem before compilation. To address this issue, we propose two methods for compiling away constraints without grounding, making them suitable for large-scale planning problems. We prove the correctness of our compilers and outline their worst-case time complexity. Moreover, we present a reproducible empirical evaluation on the domains used in the latest International Planning Competition. Our results demonstrate that our methods are efficient and produce planning specifications that are orders of magnitude more succinct than the ones produced by compilers that ground the domain, while remaining competitive when used for planning with a state-of-the-art planner.

Subject: AAAI.2026 - Planning, Routing, and Scheduling

#25 Makespan Investigations of Sequential, Parallel, PO, and POCL Plans [PDF] [Copy] [Kimi] [REL]

Authors: Harrison Oates, Pascal Bercher

Modern planning systems utilize various plan representations - sequential, parallel, partially ordered (PO), and partial-order causal link (POCL) - each with different models for concurrency. These formalisms are often implicitly assumed to have the same base properties, particularly regarding makespan. We challenge this assumption, proving the relationship between them is fundamentally asymmetric. Our analysis shows conversions from plans with rigid concurrency layers (sequential, parallel) to those with flexible partial orders (PO, POCL) can preserve makespan. However, the reverse generally fails; the flexible orderings in PO/POCL plans can yield shorter makespans for solutions that cannot be represented in parallel plans without serialization. We prove that finding an optimal parallel representation for a given POCL plan is NP-complete, resolving a key question about their practical interchangeability. We also provide tight complexity bounds for makespan-bounded plan existence. Notably, our results disprove a claim in the literature that planning graph-based planners maximize concurrency by minimizing the critical path in derived PO plans.

Subject: AAAI.2026 - Planning, Routing, and Scheduling