| Total: 19
Current mainstream AI, at least as presented in media and measured by the number of people involved and papers published, is mainly about big data, deep learning, and recently trendy large language models. All these are techniques that are data-driven, model-free, and number-crunching. Their immense success in some areas, such as computer vision and natural language processing, started the next hype in the era of AI, which brings a question whether neural approaches, after being dismissed at the beginning of the AI era, finally conquered the world of AI and proved applicable to every problem. A deeper look at these new techniques shows they have similar issues as the old-fashioned AI techniques in the past: brittleness, making strange mistakes, and being highly dependent on data used for training. Moreover, there are problems with the explainability of results and no guarantees provided, which is a crucial issue in some application areas. In this paper, we look at core principles of the neural ML techniques, that is, being data-driven rather than knowledge-based and being model-free rather than model-based, and we argue that symbolic knowledge models can still contribute to the design of trustworthy and explainable AI systems. Specifically, we focus on hierarchical reasoning, namely hierarchical planning, which is useful for highly complex problems but is not addressed by current neural models. We propose a research plan consisting of solving specific problems in hierarchical planning as an example of a knowledge-intensive approach to problem-solving. We show close connections between these problems that allow a smooth transition between solving techniques used to solve these problems. We also propose an ultimate goal of this endeavor, that is, autonomous construction of hierarchical planning models, that addresses the crucial problem of knowledge-based approaches -- how to obtain a formal model (extract knowledge from data).
This paper introduces the Shepherd Test, a new conceptual test for assessing the moral and relational dimensions of superintelligent artificial agents. The test is inspired by human interactions with animals, where ethical considerations about care, manipulation, and consumption arise in contexts of asymmetric power and self-preservation. We argue that AI crosses an important, and potentially dangerous, threshold of intelligence when it exhibits the ability to manipulate, nurture, and instrumentally use less intelligent agents, while also managing its own survival and expansion goals. This includes the ability to weigh moral trade-offs between self-interest and the well-being of subordinate agents. The Shepherd Test thus challenges traditional AI evaluation paradigms by emphasizing moral agency, hierarchical behavior, and complex decision-making under existential stakes. We argue that this shift is critical for advancing AI governance, particularly as AI systems become increasingly integrated into multi-agent environments. We conclude by identifying key research directions, including the development of simulation environments for testing moral behavior in AI, and the formalization of ethical manipulation within multi-agent systems.
Bandit algorithms and Large Language Models (LLMs) have emerged as powerful tools in artificial intelligence, each addressing distinct yet complementary challenges in decision-making and natural language processing. This survey explores the synergistic potential between these two fields, highlighting how bandit algorithms can enhance the performance of LLMs and how LLMs, in turn, can provide novel insights for improving bandit-based decision-making. We first examine the role of bandit algorithms in optimizing LLM fine-tuning, prompt engineering, and adaptive response generation, focusing on their ability to balance exploration and exploitation in large-scale learning tasks. Subsequently, we explore how LLMs can augment bandit algorithms through advanced contextual understanding, dynamic adaptation, and improved policy selection using natural language reasoning. By providing a comprehensive review of existing research and identifying key challenges and opportunities, this survey aims to bridge the gap between bandit algorithms and LLMs, paving the way for innovative applications and interdisciplinary research in AI.
In the Multi-Agent Path Finding (MAPF) problem, the aim is to find collision free paths for multiple agents. MAPF has many practical applications and has spawned massive research interest in the past two decades. Most MAPF research assumed that every agent is assigned a target it must reach. This assumption often does not hold in several key applications such as automated warehouses and parking lots, where some agents are assigned targets to reach, and others, denoted as unassigned agents, can either stay idle or move to clear the way for the assigned agents. In this paper we introduce this important problem, explain its uniqueness and encourage the entire community to work on it.
Recent advances in generative AI for music have achieved remarkable fidelity and stylistic diversity, yet these systems often fail to align with nuanced human preferences due to the specific loss functions they use. This paper advocates for the systematic application of preference alignment techniques to music generation, addressing the fundamental gap between computational optimization and human musical appreciation. Drawing on recent breakthroughs including MusicRL's large-scale preference learning, multi-preference alignment frameworks like diffusion-based preference optimization in DiffRhythm+, and inference-time optimization techniques like Text2midi-InferAlign, we discuss how these techniques can address music's unique challenges: temporal coherence, harmonic consistency, and subjective quality assessment. We identify key research challenges including scalability to long-form compositions, reliability amongst others in preference modelling. Looking forward, we envision preference-aligned music generation enabling transformative applications in interactive composition tools and personalized music services. This work calls for sustained interdisciplinary research combining advances in machine learning, music-theory to create music AI systems that truly serve human creative and experiential needs.
Large language models (LLMs) have become indispensable, but the most celebrated efficiency methods---mixture-of-experts (MoE), speculative decoding, and complex retrieval-augmented generation (RAG)---were built for hyperscale providers with vast infrastructure and elite teams. Outside that context, their benefits collapse into overhead, fragility, and wasted carbon. The result is that a handful of Big Tech companies benefit, while thousands of hospitals, schools, governments, and enterprises are left without viable options. We argue that the next frontier is not greater sophistication at scale, but robust simplicity: efficiency that thrives under modest resources and minimal expertise. We propose a new research agenda: retrofitting pretrained models with more efficient architectures without retraining, inventing lightweight fine-tuning that preserves alignment, making reasoning economical despite long chains of thought, enabling dynamic knowledge management without heavy RAG pipelines, and adopting Overhead-Aware Efficiency (OAE) as a standard benchmark. By redefining efficiency to include adoption cost, sustainability, and fairness, we can democratize LLM deployment---ensuring that optimization reduces inequality and carbon waste rather than amplifying them.
As advances in artificial intelligence (AI) and machine learning (ML) continue to transform commercial applications, the scientific community is increasingly eager to harness AI/ML’s power to accelerate modeling and discovery. However, purely data-driven AI methods often lack interpretability, generalizability, and consistency with established scientific principles. Conversely, traditional process-based models embody deep scientific knowledge but suffer from limited scalability or incomplete representation of complex systems. Knowledge-guided machine learning (KGML) offers a promising path forward by integrating scientific knowledge with data-driven approaches to produce AI models that are robust, trustworthy, and capable of advancing both AI and science. This talk summarizes the foundations of KGML, outlines a taxonomy for organizing research efforts, and highlights emerging opportunities for broad scientific impact.
Recent research highlights the lack of reliability of large language models (LLMs) in tasks requiring complex reasoning. While they can produce impressively fluent text in response to prompts, they can fail on basic reasoning skills, such as recognizing that left is the opposite of right. They struggle even more with grounding such concepts in real-world contexts involving perception and action. Addressing real-world problems, however, typically requires models composed of multiple interdependent learners, with strong capabilities for composition and reasoning. In this talk, I will discuss the reasoning challenges of LLMs and discuss how symbolic representations can enhance neural models by enabling Spatial and Compositional Reasoning over complex linguistic structures, grounding language in visual perception, integrating multiple modalities, and dealing with uncertainty. I will overview recent research in Neurosymbolic (NeSy) modeling and emphasize the need for community-driven libraries to advance this direction. As part of this effort, I will introduce the DomiKnowS framework developed by my team, which combines symbolic and sub-symbolic representations to tackle complex, AI-complete problems, integrating symbolic and logical knowledge seamlessly into deep models and LLMs through a range of underlying algorithms.
We propose—somewhat tongue-in-cheek, yet with serious implications—a new test for artificial intelligence: the ability to watch a 90-minute episode of the long-running German crime drama Tatort, and to explain every relevant detail. This involves reconstructing the evolving social network of characters, identifying their beliefs, desires, and intentions, and, crucially, determining who committed the crime. We argue that this task integrates narrative understanding, common-sense reasoning, social cognition, and theory of mind—and thus provides a uniquely challenging benchmark for AI.
Dual-system theory distinguishes between fast, intuitive System 1 and slow, deliberative System 2. While this dichotomy describes many forms of reasoning, it oversimplifies the reality of expert legal reasoning. Legal reasoning is not merely a process of slow, logical deliberation. It is intrinsically normative, embedding precedent analysis, statutory interpretation, policy balancing, and social values. This paper envisions a reasoning architecture for legal reasoning, System L (Legal System 2), which extends traditional System 2 by integrating domain-specific normative frameworks in a structured manner. Using the IRAC (Issue–Rule–Application–Conclusion) structure as a backbone model, System L represents a blueprint for the next generation of cognitive and AI systems capable of human-like legal reasoning.
We propose a new theoretical foundation for artificial intelligence (AI) and machine learning (ML), building on ideas in pure mathematics relating to categories and functors. This paper builds on our AAAI 2025 tutorial Thinking with Functors: Category Theory for A(G)I, which provides background material. In addition, our recent papers on intuitionistic j-do calculus in Topos Causal Models} and GAIA: Categorical Foundations of Generative AI, illustrate how to generalize well-known formalisms in AI, such as causal inference and deep learning, to a category-theoretic setting.
Traditionally, the goal of mechanism design was to promote socially desirable behaviour of rational agents, to achieve fairness, or to promote efficiency. I would like to suggest a new subfield of mechanism design, Responsible Mechanism Design, focused on achieving individual accountability of agents for their contributions to the outcome of collective decisions.
Word Sense Disambiguation (WSD) has been a central challenge since the earliest proposals for Machine Translation (MT), most famously Weaver's 1949 memorandum. Classical systems treated WSD as an explicit task, grounded in lexical resources and annotated data. Recently, however, Large Language Models (LLMs) have blurred the boundary between disambiguation and general language understanding, leading some to suggest that WSD might be obsolete. This paper surveys the role of WSD in the LLM era, drawing on recent studies of encoder-based sense separation and disambiguation, and decoder-based definition selection and generation, as well as multilingual evaluation. Closed-source instruction-tuned LLMs now achieve performance comparable to specialized WSD systems, yet systematic weaknesses remain: non-predominant senses are often misclassified and disambiguation biases in MT persist. We argue that WSD is not "dead" but redefined as a diagnostic lens for assessing lexical-semantic competence, robustness, and interpretability in LLMs.
Artificial intelligence seems to be taking over the world with systems that model pixels, words, and phonemes. The world is arguably made up, not of pixels, words, and phonemes but of entities (objects, things, including events) with properties and relations among them. Surely we should model these, not the perception or description of them. You might suspect that concentrating on modeling words and pixels is because all of the (valuable) data in the world is in terms of text and images. If you look into almost any company you will find their most valuable data is in spreadsheets, databases and other relational formats. These are not the form that are studied in introductory machine learning, but are full of product numbers, student numbers, transaction numbers and other identifiers that can't be interpreted naively as numbers. The field that studies this sort of data has various names including relational learning, statistical relational AI, and many others. This paper explains why relational learning is not taking over the world -- except in a few cases with restricted relations -- and what needs to be done to bring it to it's rightful prominence.
Due to their ability of follow natural language instructions, vision-language-action (VLA) models are increasingly preva- lent in the embodied AI arena, following the widespread suc- cess of their precursors—LLMs and VLMs. In this paper, we discuss 10 principal milestones in the ongoing develop- ment of VLA models—multimodality, reasoning, data, eval- uation, cross-robkot action generalization, efficiency, whole- body coordination, safety, agents, and coordination with hu- mans. Furthermore, we discuss the emerging trends of us- ing spatial understanding, modeling world dynamics, post training, and data synthesis—all aiming to reach these mile- stones. Through these discussions, we hope to bring attention to the research avenues that may accelerate the development of VLA models into wider acceptability.
AI is currently in the midst of a boom, mostly due to the success and predominance of large language models and associated models for other perceptual tasks such as computer vision. Yet AI has experienced several booms and busts over the past 75 years. While the booms are driven by commercial potential, the following busts affect not only commercial investment but also research funding and trends. This paper examines the expert systems boom of the 1980s and the following AI winter, identifies similarities, analogs, and differences with the current boom, and projects potential outcomes and directions for AI research that may follow when the current enthusiasm wanes based on these similarities, analogs, and differences. The presentation is distinct from currently active discussions and debates about the potential and limitations of large models such as whether problems such as hallucination will be solved, whether they can reason, or whether they will achieve AGI; rather, it examines previous AI techniques and how they evolved once their capabilities and limitations became well understood.
From expert AI systems of the 1970s to self-supervised systems of the 2020s, the pendulum of AI development has swung from heavy reliance on human feedback to no or minimal reliance in the last 50 years. Self-supervised approaches have contributed significantly to the success and scalable development of AI. However, today we are at a tipping point where the future of AI, and whether so-ciety ends up benefiting from this technology in the long run, depends critically on the subsequent AI develop-ment aligning with human goals and values. Realizing this, there has been ramping up of efforts to align AI models with human expectations and values. Human feedback, however, remains limited and difficult to elicit. Thus, a key question lingers – how can we scale up alignment of AI systems with individual expectations and societal norms? This talk and paper provides an overview and perspective on efforts at answering this question.
Voting is one of the most prominent applications of preference aggregation and computational social choice. While much of the literature focuses on models involving discrete candidates, there has been a growing interest in voting over divisible resources, such as budget, space, and time. In this survey, we review existing work on voting in divisible settings, including fundamental models of budget aggregation, fair mixing, and cake sharing. We also establish connections among these models, highlight unifying themes across different frameworks, and suggest directions for future research.
The architectural blueprint of today’s leading text-to-image models contains a fundamental flaw: an inability to handle logical composition. This survey investigates this breakdown across three core primitives—negation, counting, and spatial relations. Our analysis reveals a dramatic performance collapse: models that are accurate on single primitives fail precipitously when these are combined, exposing severe interference. We trace this failure to three key factors. First, training data show a near-total absence of explicit negations. Second, continuous attention architectures are fundamentally unsuitable for discrete logic. Third, evaluation metrics reward visual plausibility over constraint satisfaction. By analyzing recent benchmarks and methods, we show that current solutions and simple scaling cannot bridge this gap. Achieving genuine compositionality, we conclude, will require fundamental advances in representation and reasoning rather than incremental adjustments to existing architectures.