AAAI.2026 - Reasoning under Uncertainty | Cool Papers

#1 Temporal Properties of Conditional Independence in Dynamic Bayesian Networks [PDF] [Copy] [Kimi] [REL]

Authors: Rajab Aghamov, Christel Baier, Joël Ouaknine, Jakob Piribauer, Mihir Vahanwala, Isa Vialard

Dynamic Bayesian networks (DBNs) are compact graphical representations used to model probabilistic systems where interdependent random variables and their distributions evolve over time. In this paper, we study the verification of the evolution of conditional-independence (CI) propositions against temporal logic specifications. To this end, we consider two specification formalisms over CI propositions: linear temporal logic (LTL), and non-deterministic Büchi automata (NBAs). This problem has two variants. Stochastic CI properties take the given concrete probability distributions into account, while structural CI properties are viewed purely in terms of the graphical structure of the DBN. We show that deciding whether a stochastic CI proposition eventually holds is at least as hard as the Skolem problem for linear recurrence sequences, which is a long-standing open problem in number theory. On the other hand, we show that verifying the evolution of structural CI propositions against LTL and NBA specifications is in PSPACE, and is hard for both NP and coNP. We also identify natural restrictions on the graphical structure of the DBN that make the verification of structural CI properties tractable.

Subject: AAAI.2026 - Reasoning under Uncertainty

#2 Universal Learning of Stochastic Dynamics for Exact Belief Propagation Using Bernstein Normalizing Flows [PDF] [Copy] [Kimi] [REL]

Authors: Peter Amorese, Morteza Lahijanian

Predicting the distribution of future states in a stochastic system, known as belief propagation, is fundamental to reasoning under uncertainty. However, nonlinear dynamics often make analytical belief propagation intractable, requiring approximate methods. When the system model is unknown and must be learned from data, a key question arises: can we learn a model that (i) universally approximates general nonlinear stochastic dynamics, and (ii) supports analytical belief propagation? This paper establishes the theoretical foundations for a class of models that satisfy both properties. The proposed approach combines the expressiveness of normalizing flows for density estimation with the analytical tractability of Bernstein polynomials. Empirical results show the efficacy of our learned model over state-of-the art data-driven methods for belief propagation, especially for highly non-linear systems with non-additive, non-Gaussian noise.

Subject: AAAI.2026 - Reasoning under Uncertainty

#3 Instance Dependent Testing of Samplers Using Interval Conditioning [PDF] [Copy] [Kimi] [REL]

Authors: Rishiraj Bhattacharyya, Sourav Chakraborty, Yash Pote, Uddalok Sarkar, Sayantan Sen

Sampling algorithms play a pivotal role in probabilistic AI. However, verifying if a sampler program indeed samples from the claimed distribution is a notoriously hard problem. Provably correct testers like Barbarik,Teq,Flash, Cubeprobe for testing of different kinds of samplers were proposed only in the last few years. All these testers focus on the worst-case efficiency, and do not support verification of samplers over infinite domains, a case occurring frequently in Astronomy, Finance, Network Security etc. In this work, we design the first tester of samplers with instance-dependent efficiency, allowing us to test samplers over natural numbers. Our tests are developed via a novel distance estimation algorithm between an unknown and a known probability distribution using an 'interval conditioning' framework. The core technical contribution is a new connection with probability mass estimation of a continuous distribution. The practical gains are also substantial—our experiments establish up to 1000× speedup over state-of-the-art testers.

Subject: AAAI.2026 - Reasoning under Uncertainty

#4 Gaussian Approximation for Two-Timescale Linear Stochastic Approximation [PDF] [Copy] [Kimi] [REL]

Authors: Bogdan Butyrin, Artemy Rubtsov, Alexey Naumov, Vladimir V. Ulyanov, Sergey Samsonov

In this paper, we establish non-asymptotic bounds for accuracy of normal approximation for linear two-timescale stochastic approximation (TTSA) algorithms driven by martingale difference or Markov noise. Focusing on both the last iterate and Polyak–Ruppert averaging regimes, we derive bounds for normal approximation in terms of the convex distance between probability distributions. Our analysis reveals a non-trivial interaction between the fast and slow timescales: the normal approximation rate for the last iterate improves as the timescale separation increases, while it decreases in the Polyak–Ruppert averaged setting. We also provide the high-order moment bounds for the error of linear TTSA algorithm, which may be of independent interest. Finally, we demonstrate that our theoretical results are directly applicable to reinforcement learning algorithms such as GTD and TDC.

Subject: AAAI.2026 - Reasoning under Uncertainty

#5 Offline Multi-Objective Bandits: From Logged Data to Pareto-Optimal Policies [PDF] [Copy] [Kimi] [REL]

Authors: Ji Cheng, Song Lai, Shunyu Yao, Bo Xue

Offline policy learning from logged data is a critical paradigm for enabling effective decision-making without costly online exploration. However, its application has been largely confined to single-objective problems, a stark contrast to real-world scenarios where decision-making inherently involves navigating multiple, often conflicting, objectives. This paper introduces a comprehensive framework for Offline Multi-Objective Bandits (OffMOB), providing a principled solution to the fundamental challenge of learning Pareto-optimal policies from a static dataset. Our core contribution is a novel algorithm that uniquely integrates the pessimism principle with multi-objective optimization to safely learn from off-policy data. Crucially, our approach transcends the primary limitation of scalarization techniques, which are restricted to finding a single policy for a pre-defined preference. Instead, OffMOB directly approximates the entire Pareto front, learning a single, flexible policy model capable of generating an optimal action for any desired trade-off. To rigorously evaluate performance, we introduce the Tchebycheff sub-optimality metric and establish the first finite-sample generalization bounds for this problem class, proving that our algorithm converges to the true Pareto front under practical data coverage assumptions. Extensive experiments on complex benchmarks demonstrate that OffMOB significantly outperforms existing methods, identifying the complete set of optimal trade-offs where naive extensions fail.

Subject: AAAI.2026 - Reasoning under Uncertainty

#6 Learning from Imperfect Data: Robust Inference of Dynamic Systems Using Simulation-Based Generative Model [PDF] [Copy] [Kimi] [REL]

Authors: Hyunwoo Cho, Hyeontae Jo, Hyung Ju Hwang

System inference for nonlinear dynamic models, represented by ordinary differential equations (ODEs), remains a significant challenge in many fields, particularly when the data are noisy, sparse, or partially observable. In this paper, we propose a Simulation-based Generative Model for Imperfect Data (SiGMoID), that enables precise and robust inference for dynamic systems. The proposed approach integrates two key methods: (1) HyperPINN, and (2) W-GAN. We demonstrate that SiGMoID quantifies data noise, estimates system parameters, and infers unobserved system components. Its effectiveness is validated by analyzing examples based on realistic experiments, showcasing its broad applicability in various domains, from scientific research to engineered systems, and enabling the discovery of full system dynamics.

Subject: AAAI.2026 - Reasoning under Uncertainty

#7 Interpolated Stochastic Interventions Based on Propensity Scores, Target Policies and Treatment-Specific Costs [PDF] [Copy] [Kimi] [REL]

Author: Johan de Aguas

We introduce two families of stochastic interventions with discrete treatments that connect causal modeling to cost-sensitive decision making. The interventions arise from a cost-penalized information projection of the independent product of the organic propensity scores and a reference policy, yielding closed-form Boltzmann-Gibbs couplings. The induced marginals define modified stochastic policies that interpolate smoothly, via a tilt parameter, from the organic law or from the reference law toward a product-of-experts limit when all destination costs are strictly positive. The first family recovers and extends incremental propensity score interventions, retaining identification without global positivity. For inference on the expected outcomes after these policies, we derive the efficient influence functions under a nonparametric model and construct one-step estimators. In simulations, the proposed estimators improve stability and robustness to nuisance misspecification relative to plug-in baselines. The framework can operationalize graded scientific hypotheses under realistic constraints. Because inputs are modular, analysts can sweep feasible policy spaces, prototype candidates, and align interventions with budgets and logistics before committing experimental resources.

Subject: AAAI.2026 - Reasoning under Uncertainty

#8 Conformal Prediction for Multi-Source Detection on a Network [PDF] [Copy] [Kimi] [REL]

Authors: Xingchao Jian, Purui Zhang, Lan Tian, Feng Ji, Wenfei Liang, Wee Peng Tay, Bihan Wen, Felix Krahmer

Detecting the origin of information or infection spread in networks is a fundamental challenge with applications in misinformation tracking, epidemiology, and beyond. We study the multi-source detection problem: given snapshot observations of node infection status on a graph, estimate the set of source nodes that initiated the propagation. Existing methods either lack statistical guarantees or are limited to specific diffusion models and assumptions. We propose a novel conformal prediction framework that provides statistically valid recall guarantees for source set detection, independent of the underlying diffusion process or data distribution. Our approach introduces principled score functions to quantify the alignment between predicted probabilities and true sources, and leverages a calibration set to construct prediction sets with user-specified recall and coverage levels. The method is applicable to both single- and multi-source scenarios, supports general network diffusion dynamics, and is computationally efficient for large graphs. Empirical results demonstrate that our method achieves rigorous coverage with competitive accuracy, outperforming existing baselines in both reliability and scalability.

Subject: AAAI.2026 - Reasoning under Uncertainty

#9 Learning-Augmented Ski Rental with Discrete Distribution: A Bayesian Approach [PDF] [Copy] [Kimi] [REL]

Authors: Bosun Kang, Hyejun Park, Chenglin Fan

We revisit the classic ski rental problem through the lens of Bayesian decision-making and machine-learned predictions. While traditional algorithms minimize worst-case cost without assumptions, and recent learning-augmented approaches leverage noisy forecasts with robustness guarantees, our work unifies these perspectives. We propose a discrete Bayesian framework that maintains exact posterior distributions over the time horizon, enabling principled uncertainty quantification and seamless incorporation of expert priors. Our algorithm achieves prior-dependent competitive guarantees and gracefully interpolates between worst-case and fully-informed settings. Our extensive experimental evaluation demonstrates superior empirical performance across diverse scenarios, achieving near-optimal results under accurate priors while maintaining robust worst-case guarantees. This framework naturally extends to incorporate multiple predictions, non-uniform priors, and contextual information, highlighting the practical advantages of Bayesian reasoning in online decision problems with imperfect predictions.

Subject: AAAI.2026 - Reasoning under Uncertainty

#10 Potential Outcome Rankings for Counterfactual Decision Making [PDF] [Copy] [Kimi] [REL]

Authors: Yuta Kawakami, Jin Tian

Counterfactual decision-making in the face of uncertainty involves selecting the optimal action from several alternatives using causal reasoning. Decision-makers often rank expected potential outcomes (or their corresponding utility and desirability) to compare the preferences of candidate actions. In this paper, we study new counterfactual decision-making rules by introducing two new metrics: the probabilities of potential outcome ranking (PoR) and the probability of achieving the best potential outcome (PoB). PoR reveals the most probable ranking of potential outcomes for an individual, and PoB indicates the action most likely to yield the top-ranked outcome for an individual. We then establish identification theorems and derive bounds for these metrics, and present estimation methods. Finally, we perform numerical experiments to illustrate the finite-sample properties of the estimators and demonstrate their application to a real-world dataset.

Subject: AAAI.2026 - Reasoning under Uncertainty

#11 Causal Inference Under Threshold Manipulation: Bayesian Mixture Modeling and Heterogeneous Treatment Effects [PDF] [Copy] [Kimi] [REL]

Authors: Kohsuke Kubota, Shonosuke Sugasawa

Many marketing applications, including credit card incentive programs, offer rewards to customers who exceed specific spending thresholds to encourage increased consumption. Quantifying the causal effect of these thresholds on customers is crucial for effective marketing strategy design. Although regression discontinuity design is a standard method for such causal inference tasks, its assumptions can be violated when customers, aware of the thresholds, strategically manipulate their spending to qualify for the rewards. To address this issue, we propose a novel framework for estimating the causal effect under threshold manipulation. The main idea is to model the observed spending distribution as a mixture of two distributions: one representing customers strategically affected by the threshold, and the other representing those unaffected. To fit the mixture model, we adopt a two-step Bayesian approach consisting of modeling non-bunching customers and fitting a mixture model to a sample around the threshold. We show posterior contraction of the resulting posterior distribution of the causal effect under large samples. Furthermore, we extend this framework to a hierarchical Bayesian setting to estimate heterogeneous causal effects across customer subgroups, allowing for stable inference even with small subgroup sample sizes. We demonstrate the effectiveness of our proposed methods through simulation studies and illustrate their practical implications using a real-world marketing dataset.

Subject: AAAI.2026 - Reasoning under Uncertainty

#12 High-Order Error Bounds for Markovian LSA with Richardson–Romberg Extrapolation [PDF] [Copy] [Kimi] [REL]

Authors: Ilya Levin, Alexey Naumov, Sergey Samsonov

In this paper, we study the bias and high-order error bounds of the Linear Stochastic Approximation (LSA) algorithm with Polyak-Ruppert (PR) averaging under Markovian noise. We focus on the version of the algorithm with constant step size and propose a novel decomposition of the bias via a linearization technique. We analyze the structure of the bias and show that the leading-order term is linear in the step size and cannot be eliminated by PR averaging. To address this, we apply the Richardson-Romberg (RR) extrapolation procedure, which effectively cancels the leading bias term. We derive high-order moment bounds for the RR iterates and show that the leading error term aligns with the asymptotically optimal covariance matrix of the vanilla averaged LSA iterates. We validate applicability of our findings for the temporal difference algorithm in reinforcement learning.

Subject: AAAI.2026 - Reasoning under Uncertainty

#13 MetaGameBO: Hierarchical Game-Theoretic Driven Robust Meta-Learning for Bayesian Optimization [PDF] [Copy] [Kimi] [REL]

Authors: Hui Li, Huafeng Liu, Yiran Fu, Shuyang Lin, Baoxin Zhang, Deqiang Ouyang, Liping Jing, Jian Yu

Meta-learning for Bayesian optimization accelerates optimization by leveraging knowledge from previous tasks, but existing methods optimize for average performance and fail on challenging outlier tasks critical in practice. These limitations become particularly severe when target tasks exhibit distribution shifts or when optimization budgets are limited in real-world applications. We introduce MetaGameBO, a hierarchical game-theoretic framework that formulates meta-learning as robust optimization through CVaR-based task selection and diversity-aware sample learning. Our approach incorporates uncertainty-aware adaptation via probabilistic embeddings and Thompson sampling for robust generalization to out-of-distribution targets. We establish theoretical guarantees including convergence to game-theoretic equilibria and improved sample complexity, and demonstrate substantial improvements with 95.7% reduction in average loss and 88.6% lower tail risk compared to state-of-the-art methods on challenging tasks and distribution shifts.

Subject: AAAI.2026 - Reasoning under Uncertainty

#14 Coarse-to-Fine Open-Set Graph Node Classification with Large Language Models [PDF] [Copy] [Kimi] [REL]

Authors: Xueqi Ma, Xingjun Ma, Sarah Monazam Erfani, Danilo Mandic, James Bailey

Developing open-set classification methods capable of classifying in-distribution (ID) data while detecting out-of-distribution (OOD) samples is essential for deploying graph neural networks (GNNs) in open-world scenarios. Existing methods typically treat all OOD samples as a single class, despite real-world applications—especially high-stake settings like fraud detection and medical diagnosis—demanding deeper insights into OOD samples, including their probable labels. This raises a critical question: Can OOD detection be extended to OOD classification without true label information? To answer this question, we introduce a Coarse-to-Fine open-set Classification (CFC) method that leverages large language models (LLMs) for text-attributed graphs. CFC consists of three key components: (1) A coarse classifier that utilizes LLM prompts for OOD detection and outlier label generation; (2) A GNN-based fine classifier trained with OOD samples from (1) for enhanced OOD detection and ID classification; and (3) Refined OOD classification achieved through LLM prompts and post-processed OOD labels. Unlike methods relying on synthetic or auxiliary OOD samples, CFC employs semantic OOD data-instances that are genuinely out-of-distribution based on their inherent meaning, thus improving interpretability and practical utility. CFC enhances OOD detection by 10% compared to state-of-the-art approaches on text-attributed graphs and in the text domain, while achieving up to 70% accuracy in OOD classification on graph datasets.

Subject: AAAI.2026 - Reasoning under Uncertainty

#15 MTRL-CG: Multi-Task Reinforcement Learning Method with Spectral Clustering-Based Task Grouping [PDF] [Copy] [Kimi] [REL]

Authors: Wenjia Meng, Teng Zhang, Haoliang Sun, Yilong Yin

Multi-task reinforcement learning (RL) aims to enhance agent performance across multiple tasks by enabling effective knowledge transfer. However, these methods adopt a fully shared policy across all tasks without explicitly distinguishing between related and conflicting ones, making them suffer from negative interference issue, where updates beneficial to one task adversely affect others and lead to degraded overall performance. In this paper, we propose a multi-task reinforcement learning method with spectral clustering-based task grouping (MTRL-CG), which leverages spectral clustering to group related tasks and separate conflicting ones, enabling group-wise policy learning to mitigate negative interference. We first quantify inter-task affinity by measuring the influence of task-specific updates on others within a shared model, and construct an affinity matrix to capture these relationships. Spectral clustering is then applied to partition tasks via spectral embedding and k-means clustering. Each task group is trained with a dedicated policy network to promote focused learning. Built upon the Soft Actor-Critic (SAC) algorithm, MTRL-CG can be readily integrated into existing SAC-based multi-task RL methods. Extensive experiments on the Meta-World benchmark demonstrate the effectiveness of the proposed MTRL-CG method.

Subject: AAAI.2026 - Reasoning under Uncertainty

#16 Targeting in Multi-Criteria Decision Making [PDF] [Copy] [Kimi] [REL]

Authors: Nicolas Schwind, Patricia Everaere, Sébastien Konieczny, Emmanuel Lonca

In this work, we introduce the notion of targeting for multi-criteria decision making. The problem involves selecting the best alternatives related to one particular alternative, called the target. We use an axiomatic approach to this problem by establishing properties that any targeting method should satisfy. We present a representation theorem and show that satisfying the main properties of targeting requires aggregating the evaluations of the alternatives related to the target. We propose various candidate targeting methods and examine the properties satisfied by each method.

Subject: AAAI.2026 - Reasoning under Uncertainty

#17 Causal Structure Learning for Dynamical Systems with Theoretical Score Analysis [PDF] [Copy] [Kimi] [REL]

Authors: Nicholas Tagliapietra, Katharina Ensinger, Christoph Zimmer, Osman Mian

Real world systems evolve in continuous-time according to their underlying causal relationships, yet their dynamics are often unknown. Existing approaches to learning such dynamics typically either discretize time ---leading to poor performance on irregularly sampled data--- or ignore the underlying causality. We propose CADYT, a novel method for causal discovery on dynamical systems addressing both these challenges. In contrast to state-of-the-art causal discovery methods that model the problem using discrete-time Dynamic Bayesian networks, our formulation is grounded in Difference-based causal models, which allow milder assumptions for modeling the continuous nature of the system. CADYT leverages exact Gaussian Process inference for modeling the continuous-time dynamics which is more aligned with the underlying dynamical process. We propose a practical instantiation that identifies the causal structure via a greedy search guided by the Algorithmic Markov Condition and Minimum Description Length principle. Our experiments show that CADYT outperforms state-of-the-art methods on both regularly and irregularly-sampled data, discovering causal networks closer to the true underlying dynamics.

Subject: AAAI.2026 - Reasoning under Uncertainty

#18 Bayesian Network Structural Consensus via Greedy Min-Cut Analysis [PDF] [Copy] [Kimi] [REL]

Authors: Pablo Torrijos, Jose M. Puerta, Juan A. Aledo, José A. Gámez

This paper presents the Min-Cut Bayesian Network Consensus (MCBNC) algorithm, a greedy method for structural consensus of Bayesian Networks (BNs), with applications in federated learning and model aggregation. MCBNC prunes weak edges from an initial unrestricted fusion using a structural score based on min-cut analysis, integrated into a modified Backward Equivalence Search (BES) phase of the Greedy Equivalence Search (GES) algorithm. The score quantifies edge support across input networks and is computed using max-flow. Unlike methods with fixed treewidth bounds, MCBNC introduces a pruning threshold θ that can be selected post hoc using only structural information. Experiments on real-world BNs show that MCBNC yields sparser, more accurate consensus structures than both canonical fusion and the input networks. The method is scalable, data-agnostic, and well-suited for distributed or federated structural learning of BNs or causal discovery.

Subject: AAAI.2026 - Reasoning under Uncertainty

#19 Robust Causal Discovery Under Imperfect Structural Constraints [PDF] [Copy] [Kimi] [REL]

Authors: Zidong Wang, Xi Lin, Chuchao He, Xiaoguang Gao

Robust causal discovery from observational data under imperfect prior knowledge remains a significant and largely unresolved challenge. Existing methods typically presuppose perfect priors or can only handle specific, pre-identified error types. And their performance degrades substantially when confronted with flawed constraints of unknown location and type. This decline arises because most of them rely on inflexible and biased thresholding strategies that may conflict with the data distribution. To overcome these limitations, we propose to harmonizes knowledge and data through prior alignment and conflict resolution. First, we assess the credibility of imperfect structural constraints through a surrogate model, which then guides a sparse penalization term measuring the loss between the learned and constrained adjacency matrices. We theoretically prove that, under ideal assumption, the knowledge-driven objective aligns with the data-driven objective. Furthermore, to resolve conflicts when this assumption is violated, we introduce a multi-task learning framework optimized via multi-gradient descent, jointly minimizing both objectives. Our proposed method is robust to both linear and nonlinear settings. Extensive experiments, conducted under diverse noise conditions and structural equation model types, demonstrate the effectiveness and efficiency of our method under imperfect structural constraints.

Subject: AAAI.2026 - Reasoning under Uncertainty

#20 Multiple-play Stochastic Bandits with Prioritized Arm Capacity Sharing [PDF] [Copy] [Kimi] [REL]

Authors: Hong Xie, Haoran Gu, Yanying Huang, Tao Tan, Defu Lian

This paper proposes a variant of multiple-play stochastic bandits tailored to resource allocation problems arising from LLM applications, edge intelligence, etc. The model is composed of finite number of arms and plays. Each arm has a stochastic number of capacities, and each unit of capacity is associated with a reward function. Each play is associated with a priority weight. When multiple plays compete for the arm capacity, the arm capacity is allocated in a larger priority weight first manner. Instance independent and instance dependent regret lower bounds are proved, revealing the impact of model parameters on the hardness of learning the optimal allocation policy. When model parameters are given, we design an algorithm named MSB-PRS-OffOpt to locate the optimal play allocation policy with a polynomial computational complexity in the number of arms and plays. Utilizing MSB-PRS-OffOpt as a subroutine, an approximate upper confidence bound (UCB) based algorithm is designed, which has instance independent and instance dependent regret upper bounds matching the corresponding lower bound up to acceptable factors. To this end, we address nontrivial technical challenges arising from optimizing and learning under a special nonlinear combinatorial utility function induced by the prioritized resource sharing mechanism.

Subject: AAAI.2026 - Reasoning under Uncertainty

#21 Exploring Non-Convex Discrete Energy Landscapes: An Efficient Langevin-Like Sampler with Replica Exchange [PDF] [Copy] [Kimi] [REL]

Authors: Haoyang Zheng, Hengrong Du, Ruqi Zhang, Guang Lin

Gradient-based Discrete Samplers (GDSs) are effective for sampling discrete energy landscapes. However, they often stagnate in complex, non-convex settings. To improve exploration, we introduce the Discrete Replica EXchangE Langevin (DREXEL) sampler and its variant with Adjusted Metropolis (DREAM). These samplers use two GDSs at different temperatures and step sizes: one focuses on local exploitation, while the other explores broader energy landscapes. When energy differences are significant, sample swaps occur, governed by a mechanism tailored for discrete sampling to ensure detailed balance. Theoretically, we prove that the proposed samplers satisfy detailed balance and converge to the target distribution under mild conditions. Experiments across 2d synthetic simulations, sampling from Ising models and restricted Boltzmann machines, and training deep energy-based models further confirm their efficiency in exploring non-convex discrete energy landscapes.

Subject: AAAI.2026 - Reasoning under Uncertainty