| Total: 67
We present NoReGeo, a novel benchmark designed to evaluate the intrinsic geometric understanding of large language models (LLMs) without relying on reasoning or algebraic computation. Unlike existing benchmarks that primarily assess models' proficiency in reasoning-based geometry-where solutions are derived using algebraic methods-NoReGeo focuses on evaluating whether LLMs can inherently encode spatial relationships and recognize geometric properties directly. Our benchmark comprises 2,500 trivial geometric problems spanning 25 categories, each carefully crafted to be solvable purely through native geometric understanding, assuming known object locations. We assess a range of state-of-the-art models on NoReGeo, including frontier models like GPT-4, observing that even the most advanced systems achieve an overall maximum of 65% accuracy in binary classification tasks. Further, our ablation experiments demonstrate that such geometric understanding does not emerge through fine-tuning alone, indicating that effective training for geometric comprehension requires a specialized approach from the outset. Our findings highlight a significant gap in current LLMs' ability to natively grasp geometric concepts, providing a foundation for future research toward models with true geometric cognition.
Recently, there has been an increasing interest in extending Dung's framework with probability theory, leading to the Probabilistic Argumentation Framework (PAF), and with supports in addition to attacks, leading to the Bipolar Argumentation Framework (BAF). In this paper, we introduce the Conditional Probabilistic Bipolar Argumentation Framework (CPBAF), which extends Probabilistic and Bipolar AF by allowing conditional probabilities on arguments, attacks, and on (possibly cyclic) supports. In this setting, we address the problem of computing the probability that a given argument is accepted. This is carried out by introducing the concept of probabilistic explanation for a given (probabilistic) extension. We show that the complexity of the problem is FP^#P-hard and propose polynomial approximation algorithms with bounded additive error for CPBAF where cycles with an odd number of attacks are forbidden.
Assessing the strength of arguments is essential for determining the outcomes of any argument-based system. A wide range of semantics has been proposed in the literature. These take as input a set of arguments—each assigned a basic weight and potentially subject to attacks from others—and compute a single strength value for each argument. Despite the diversity of argument types (or schemes), existing semantics apply uniform evaluation criteria across all arguments. In this paper, we advocate for type-dependent evaluations, acknowledging that the impact of attacks can vary across types. Given that many argument-based systems involve heterogeneous types of arguments, we propose a broad family of hybrid semantics that combine distinct base semantics, each tailored to specific argument types. We investigate their theoretical properties, present concrete instances within this family, and examine their computational complexity.
A fundamental use of knowledge bases (KBs) is query answering, i.e., retrieving the information entailed by the KB in response to a user query. When both the KB and the query are specified as logical formulae, the standard form of answer provided to users is the set of all certain answers (CAs): tuples of constants that satisfy the formula defining the query in every model of the logical theory defining the KB. Despite their wide adoption, CAs are known to be just a lossy representation of the information that a KB and a query provide. While several alternative answer languages have been proposed in the literature, no general consensus has emerged on the most suitable approach to query answering over ontological KBs, as each language comes with its own limitations. To address some of these issues, we introduce Regularly Recurrent Answers (RRAs), a novel answer language for queries over ontological KBs based on regular expressions. RRAs support the representation of infinite sets of tuples of constants via a simple (and arguably well understood) generation mechanism. We show that RRAs can capture a fundamental fragment of the certain information entailed by union of conjunctive queries and DL-Lite KBs, making them a strong candidate for informative query answering settings. Our contribution includes the formal definition of RRAs, a proof of their informativeness, and a study of the computational complexity of query answering problem using RRAs.
Computational argumentation studies fundamental methods for reasoning within Artificial Intelligence (AI). Two prominent subfields in computational argumentation are abstract argumentation and structured argumentation. Abstract argumentation focuses on the interactions between arguments, ignoring their internal structure, while structured approaches utilize a given knowledge base to construct the arguments. Thus, the latter approach incorporates the internal structure of arguments into the reasoning process. In this work we introduce a form of abstraction on the well established structured approach of Assumption-Based Argumentation (ABA). Our goal is to provide methods to simplify complicated scenarios, by applying clustering over defeasible parts. Abstraction, particularly clustering, has been explored in recent research on abstract argumentation and in the adjacent field of logic programming. In fact, while clustering has also been applied to ABA, our approach takes a different, or rather dual, direction. In contrast to prior work on over-approximation on ABA, we propose the dual approach of under-approximation. We provide semantics for reasoning over clustered frameworks in a sound manner relative to original semantics, ensuring that any set deemed acceptable in the clustered scenario corresponds to an acceptable set. We show properties of the under-approximating semantics and illustrate our approach using a conceptual example based on medical recommendations.
As the use of AI in society grows, addressing emerging biases is essential to prevent systematic discrimination. Several bias detection methods have been proposed, but, with few exceptions, these tend to ignore transparency. Instead, interpretability and explainability are core requirements for algorithmic fairness, even more so than for other algorithmic solutions, given the human-oriented nature of fairness. We present ABIDE (Argumentative BIas detection by DEbate), a novel framework that structures bias detection transparently as debate, guided by an underlying argument graph as understood in (formal and computational) argumentation. The arguments are about the success chances of groups in local neighbourhoods and the significance of these neighbourhoods. We evaluate ABIDE experimentally and demonstrate its strengths in performance against an argumentative baseline.
In this paper, we study the data complexity of querying inconsistent weighted description logic (DL) knowledge bases under recently-introduced cost-based semantics. In a nutshell, the idea is to assign each interpretation a cost based upon the weights of the violated axioms and assertions, and certain and possible query answers are determined by considering all (resp. some) interpretations having optimal or bounded cost. Whereas the initial study of cost-based semantics focused on DLs between EL_bot and ALCO, we consider DLs that may contain inverse roles and role inclusions, thus covering prominent DL-Lite dialects. Our data complexity analysis goes significantly beyond existing results by sharpening several lower bounds and pinpointing the precise complexity of optimal-cost certain answer semantics (no non-trivial upper bound was known). Moreover, while all existing results show the intractability of cost-based semantics, our most challenging and surprising result establishes that if we consider DL-Lite^H_bool ontologies and a fixed cost bound, certain answers for instance queries and possible answers for conjunctive queries can be computed using first-order rewriting and thus enjoy the lowest possible data complexity (AC0).
Deep Learning techniques are nowadays pervasive in AI. However, these approaches suffer from a lack of transparency for justifying their output and for helping users in believing in their decisions. For these reasons alternative approaches to learning deserve to be explored either for developing new tools with autonomous learning capability or for explaining the results of black-box predictors. Among them an important role is assumed since the Nineties by Inductive Logic Programming and, in particular, recently by the approaches of Learning from Answer Sets (LAS). Computing inductive solutions for LAS tasks is known to be Sigma P2 hard. In this work, we tackle this problem using a single-shot disjunctive ASP encoding based on the saturation technique originally proposed by Eiter and Gottlob. We prove that, when the background knowledge and hypothesis space form a tight program (a syntactical property) our encoding is linear in the size of the task. This approach contrasts with the state-of-the-art ILASP system, which relies on multiple iterative calls to an ASP solver. As a result, it can be directly evaluated by modern disjunctive ASP solvers, leveraging decades of research and optimization in the ASP community. We implement our method in a system named LASCO. Experimental results on a diverse set of benchmarks demonstrate that LASCO outperforms all versions of ILASP on many instances and it scales if run on multi-threaded machines.
In runtime verification, monitoring consists of analyzing the current execution of a system and determining, on the basis of the observed finite trace, whether all its possible continuations satisfy or violate a given specification. This is typically done by synthesizing a monitor–often a Deterministic Finite State Automaton (DFA)–from logical specifications expressed in Linear Temporal Logic (LTL) or in its finite-word variant (LTLf). Unfortunately, the size of the resulting DFA may incur a doubly exponential blow-up in the size of the formula. In this paper, we identify some conditions under which monitoring can be done without constructing such a DFA. We build on the notion of intentionally safe and cosafe formulas to show that monitoring of these formulas can be carried out through trace-checking, that is, by directly evaluating them on the current system trace, with a polynomial complexity in the size of both the trace and the formula. In addition, we investigate the complexity of recognizing intentionally safe and cosafe formulas for the safety and cosafety fragments of LTL and LTLf. As for LTLf, we show that all formulas in these fragments are intentionally safe and cosafe, thus removing the need for the check. As for LTL, we prove that the problem is in PSPACE, significantly improving over the EXPSPACE complexity of full LTL.
Competency Questions (CQs) play a crucial role in validating ontology design. While manually crafting CQs can be highly time-consuming and costly for ontology engineers, recent studies have explored the use of large language models (LLMs) to automate this process. However, prior approaches have largely evaluated generated CQs based on their similarity to existing datasets, which often fail to verify semantic pitfalls such as “Misusing allValuesFrom”. Since such pitfalls cannot be reliably detected through rule-based methods, we propose a novel dataset and model of Validating Semantic Pitfalls in Ontology (VSPO) for CQ generation specifically designed to verify the semantic pitfalls. To simulate missing and misused axioms, we use LLM to generate natural language definitions of classes and properties and introduce misalignments between the definitions and the ontology by removing axioms or altering logical operators (e.g., substituting union with intersection). We then fine-tune LLaMA-3.1-8B-Instruct to generate CQs that validate these semantic discrepancies between the provided definitions and the corresponding axioms. The resulting CQs can detect a broader range of modeling errors compared to existing public datasets. Our fine-tuned model demonstrates superior performance over baselines, showing 26% higher precision and 28.2% higher recall than GPT-4.1 in generating CQs for pitfall validation. This research enables automatic generation of TBox-validating CQs using LLMs, significantly reducing manual effort while improving semantic alignment between ontologies and expert knowledge. To the best of our knowledge, this is the first study to target semantic pitfall validation in CQ generation using LLMs.
Spatial representation learning is fundamental to GeoAI applications, including urban analytics, as it encodes the shapes, locations, and spatial relationships (topological and distance-based) of geo-entities such as points, polylines, and polygons. Existing methods either target a single geo-entity type or, like Poly2Vec, decompose entities into simpler components to enable Fourier transformation, introducing high computational cost. Moreover, since the transformed space lacks geometric alignment, these methods rely on uniform, non-adaptive sampling, which blurs fine-grained features like edges and boundaries. To address these limitations, we introduce Geo2Vec, a novel method inspired by signed distance fields (SDF) that operates directly in the original space. Geo2Vec adaptively samples points and encodes their signed distances (positive outside, negative inside), capturing geometry without decomposition. A neural network trained to approximate the SDF produces compact, geometry-aware, and unified representations for all geo-entity types. Additionally, we propose a rotation-invariant positional encoding to model high-frequency spatial variations and construct a structured and robust embedding space for downstream GeoAI models. Empirical results show that Geo2Vec consistently outperforms existing methods in representing shape and location, capturing topological and distance relationships, and achieving greater efficiency in real-world GeoAI applications.
More and more organizations are relying on Machine Learning (ML) models to support internal decision-making processes. To better support such processes, it would be highly beneficial to contextualize the inductively acquired knowledge encoded in these models and enable formal reasoning over it. Despite significant progress in Neural-Symbolic AI, this specific challenge remains largely under-explored. We propose a framework that allows to integrate the knowledge induced by ML classifiers with the knowledge specified by logic-based formalisms. The framework is based on the novel notion of Hybrid Knowledge Base (HKB), consisting of two components: an ontology and a set of ML binary classifiers. As usual, the ontology provides an intensional representation of the modeled domain through logic-based axioms, while the binary classifiers implicitly encode the extensional knowledge. Specifically, a HKB associates to each concept and role mentioned in the ontology a classifier based on a set of features deemed to be relevant for the application domain, thereby virtually populating the concepts and roles with the instances and pairs of instances from the feature space. Besides the definition of the new framework, as a more technical contribution we show how to reason in this framework by studying query answering over HKBs. In particular, we investigate the computational complexity of query answering in a rich language over HKBs in which the ontology is specified in (the Description Logic counterpart of) RDFS, while the binary classifiers are represented by Multi-Layer Perceptrons.
The goal of inductive logic programming (ILP) is to find a set of logical rules that generalises training examples and background knowledge. We introduce an ILP approach that identifies pointless rules. A rule is pointless if it contains a redundant literal or cannot discriminate against negative examples. We show that ignoring pointless rules allows an ILP system to soundly prune the hypothesis space. Our experiments on multiple domains, including visual reasoning and game playing, show that our approach can reduce learning times by 99% whilst maintaining predictive accuracies.
The goal of inductive logic programming (ILP) is to search for a hypothesis that generalises training data and background knowledge. The challenge is searching vast hypothesis spaces, which is exacerbated because many logically equivalent hypotheses exist. To address this challenge, we introduce a method to break symmetries in the hypothesis space. We implement our idea in answer set programming. Our experiments on multiple domains, including visual reasoning and game playing, show that our approach can reduce solving times from over an hour to just 17 seconds.
Analogical reasoning is a powerful inductive mechanism, widely used in human cognition and increasingly applied in artificial intelligence. Formal frameworks for analogical inference have been developed for Boolean domains, where inference is provably sound for affine functions and approximately correct for functions close to affine. These results have informed the design of analogy-based classifiers. However, they do not extend to regression tasks or continuous domains. In this paper, we revisit analogical inference from a foundational perspective. We first present a counterexample showing that existing generalization bounds fail even in the Boolean setting. We then introduce a unified framework for analogical reasoning in real-valued domains based on parameterized analogies defined via generalized means. This model subsumes both Boolean classification and regression, and supports analogical inference over continuous functions. We characterize the class of analogy-preserving functions in this setting and derive both worst-case and average-case error bounds under smoothness assumptions. Our results offer a general theory of analogical inference across discrete and continuous domains.
The ASP(Q) language extends Answer Set Programming (ASP) with Quantifiers that operate over answer sets. Thus, ASP(Q) facilitates a more natural encoding of problems whose complexity exceeds NP within the ASP framework. In this paper we focus on ASP(Q) programs with two quantifiers, i.e., 2-ASP(Q) programs, which can be used to model problems in the second level of the Polynomial Hierarchy. In particular, we propose an approach for evaluating 2-ASP(Q) programs that is inspired by Counterexample Guided Abstraction Refinement (CEGAR). Unlike existing state-of-the-art ASP(Q) solvers, which are typically based on QBF solvers, our new approach leverages ASP solvers, and suffers no overhead due to the effects of translating ASP(Q) in QBF. Experimental results demonstrate that our technique consistently outperforms state-of-the-art ASP(Q) solvers, across benchmark problems located at the second level of the polynomial hierarchy.
Qualitative spatial representation approaches which rely on Goodman-style predicative mereological theories and on a pseudo-topology, often causes some problems either for their use as a meta-information for knowledge conceptualization in advanced geometric reasoning, since they lack Euclidean geometry and fully-fledged topological spaces in the classical sense. Therefore, this paper seeks to extend an existing formalization, grounded in an underlying type theory using the Coq language, together with the Whitehead-like point-free Tarski's geometry. More precisely, we leverage an available library called lambda-MM to formalize Tarski's geometry of solids by investigating an algebraic formulation of topological relations on top of the library. Given that Tarski's work is grounded in Lesniewski’s mereology, and despite the fact that lambda-MM barely implements Tarski's geometry, the first part of the paper supplements their work by proving that mereological classes correspond to regular open sets. It forms a topology of individual names extensible with Tarski's geometric primitives. Unlike classical approaches used in qualitative logical theories, we adopt a solution that enables the specification of a topological space from mereology and a geometric subspace, thereby enhancing the theory’s expressiveness. Then, in a second part, we prove that Tarski’s geometry forms a subspace of the previous topology in which regions are restricted classes. We prove three postulates of Tarski’s work reducing his axiomatic system and extend the theory with the T2 (Hausdorff) property and additional definitions.
We investigate the problem of synthesizing strategies that guarantee the successful execution of a high-level nondeterministic agent program in Golog within a nondeterministic first-order basic action theory, considering the environment as adversarial. Our approach constructs a symbolic program graph that captures the control flow independently of the domain, enabling strategy synthesis through the cross product of the program graph with the domain model. We formally relate graph-based transitions to standard Golog semantics and provide a synthesis procedure that is sound though incomplete (in general, the problem is undecidable, given that we have a first-order representation of the state). We also extend the framework to handle the case where the environment's possible behaviors are specified by a Golog program.
Knowledge graph reasoning in the fully-inductive setting—where both entities and relations at test time are unseen during training—remains an open challenge. In this work, we introduce GraphOracle, a novel framework that achieves robust fully-inductive reasoning by transforming each knowledge graph into a Relation-Dependency Graph (RDG). The RDG encodes directed precedence links between relations, capturing essential compositional patterns while drastically reducing graph density. Conditioned on a query relation, a multi-head attention mechanism propagates information over the RDG to produce context-aware relation embeddings. These embeddings then guide a second GNN to perform inductive message passing over the original knowledge graph, enabling prediction on entirely new entities and relations. Comprehensive experiments on 60 benchmarks demonstrate that GraphOracle outperforms prior methods by up to 25% in fully-inductive and 28% in cross-domain scenarios. Our analysis further confirms that the compact RDG structure and attention-based propagation are key to efficient and accurate generalization
Multi-agent epistemic planning (MEP) is the task of generating action sequences that achieve goals specified over both the physical world and agents’ mental states. It plays an important role in research domains such as game theory, computational economics, and cognitive science. While dynamic epistemic logic (DEL) provides an expressive framework for MEP, it requires complete, model-based specifications of the initial state and action effects, and suffers from undecidability due to the unbounded nesting of beliefs. In this work, we propose a modal variant of the situation calculus that captures much of the expressive power of the DEL approach. Inspired by the cognitive concept Theory of Mind (ToM), we introduce action theories with hierarchical structures, allowing agents to reason about other agents' action theories up to bounded depths. We develop a regression method that reduces reasoning about future states to reasoning about the initial state. By preserving bounded-order ToM throughout the regression process, our approach ensures the decidability of the planning problem. Finally, we propose an algorithm to find the optimal solution, namely, to find the shortest action sequence that achieves the goal.
Linear Temporal Logic on Finite Traces (LTLf) is a popular logic to express declarative specifications in Artificial Intelligence (AI). The recent call for explainable AI tools has made relevant the problem of computing efficiently minimal unsatisfiable cores (MUCs) and minimal correction sets (MCSes) of LTLf formulas. Recent work has focused on the extraction of MUCs on formulas in conjunctive form. In this paper, we present a method that operates on arbitrary formulas and computes a more refined notion of MUCs, as introduced by Schuppan, along with the corresponding notion of MCSes. Experiments show that our system, based on Answer Set Programming, outperforms available tools.
Market making (MM) through Reinforcement Learning (RL) has attracted significant attention in financial trading. With the development of Large Language Models (LLMs), more and more attempts are being made to apply LLMs to financial areas. A simple, direct application of LLM as an agent shows significant performance. Such methods are hindered by their slow inference speed, while most of the current research has not studied LLM distillation for this specific task. To address this, we first propose the normalized fluorescent probe to study the mechanism of the LLM’s feature. Based on the observation found by our investigation, we propose Cooperative Market Making (CMM), a novel framework that decouples LLM features across three orthogonal dimensions: layer, task, and data. Various student models collaboratively learn simple LLM features along with different dimensions, with each model responsible for a distinct feature to achieve knowledge distillation. Furthermore, CMM introduces an Hájek-MoE to integrate the output of the student models by investigating the contribution of different models in a kernel function-generated common feature space. Extensive experimental results on four real-world market datasets demonstrate the superiority of CMM over the current distillation method and RL-based market-making strategies.
Automata learning has many applications in artificial intelligence and software engineering. Central to these applications is the L* algorithm, introduced by Angluin (1987). The L* algorithm learns deterministic finite-state automata (DFAs) in polynomial time when provided with a minimally adequate teacher. Unfortunately, the L* algorithm can only learn DFAs over finite alphabets, which limits its applicability. In this paper, we extend L* to learn symbolic automata whose transitions use predicates over rational numbers, i.e., over infinite and dense alphabets. Our result makes the L* algorithm applicable to new settings like (real) RGX, and time series. Furthermore, our proposed algorithm for learning each predicate is optimal in the sense that it asks a number of queries to the teacher that is at most linear with respect to the size of their representation.
In diffusion auctions, sellers can leverage an underlying social network to broaden participation, thereby increasing their potential revenue. Specifically, sellers can incentivise participants in their auction to diffuse information about the auction through the network. While numerous variants of such auctions have been recently studied in the literature, the formal verification and strategic reasoning perspectives have not been investigated yet. Our contribution is threefold. First, we introduce a logical formalism that captures the dynamics of diffusion and its strategic dimension. Second, for such a logic, we provide model-checking procedures that allow one to verify properties like the Nash equilibrium, and that pave the way towards checking the existence of sellers' strategies. Third, we establish computational complexity results for the presented algorithms.
We study the computational problem of computing a fair means clustering of discrete vectors, which admits an equivalent formulation as editing a colored matrix into one with few distinct color-balanced rows by changing at most k values. While NP-hard in both the fairness-oblivious and the fair settings, the problem is well-known to admit a fixed-parameter algorithm in the former "vanilla" setting. As our first contribution, we exclude an analogous algorithm even for highly restricted fair means clustering instances. We then proceed to obtain a full complexity landscape of the problem, and establish tractability results which capture three means of circumventing our obtained lower bound: placing additional constraints on the problem instances, fixed-parameter approximation, or using an alternative parameterization targeting tree-like matrices.