AAAI.2026 - New Faculty Highlights | Cool Papers

#1 Towards Trustworthy Multimodal AI Systems [PDF] [Copy] [Kimi] [REL]

Machine learning models have become ubiquitous in the last decade, and with their increasing use in critical applications (e.g., healthcare, financial systems, and crime forecasting), it is vital to ensure that ML developers and practitioners understand and trust their decisions. This problem has become paramount in the era of frontier models, which are developed by training billion-parameter models on broad, uncurated datasets and extensive computing. In this talk, we will first explore the (un)reliability of existing multimodal explainability techniques in large language and multimodal models and understand the robustness and safety implications of Mechanistic Interpretability tools. Next, we will delve into two complementary threads: i) domain-specific safety and related trustworthy evaluation that surfaces risks missed by generic red-teaming, focusing on multilingual and distribution-shifted settings; and ii) methods that explicitly train and assess reasoning in medical LLMs.

Subject: AAAI.2026 - New Faculty Highlights

#2 Towards Agents That Exhibit Human-Like Autonomy in Complex Environments [PDF] [Copy] [Kimi] [REL]

Author: Rohan Chandra

Deploying intelligent, autonomous agents e.g. autonomous vehicles and robots, in the real world has been a longstanding goal in robotics and artificial intelligence (AI). We have already begun to witness the emergence of vacuum robots in our homes, service robots in warehouses, and even self-driving cars on our way to work. These environments are often dense, constrained, and unstructured, with heterogeneous agents, each with their own unique behaviors and objectives. While agents today are designed to navigate these environments safely, their overly conservative nature often leads to slow and jerky motion (frequent stopping and freezing), lack of social compliance (not giving way to other people, blocking doorways and intersection), and poor adaptability across diverse complex environments (failure due to sudden accidents e.g. liquid spills). In other words, these robots often fail to capture the essence of human-like autonomy, which involves the ability to take calculated risks, even in complex environments. In this talk, I will describe my vision for a paradigm shift in the way intelligent physical agents navigate highly dense, heterogeneous, constrained, and unstructured environments using human-like autonomy.

Subject: AAAI.2026 - New Faculty Highlights

#3 Beyond Neuron-Level Sparsity: Achieving Faithful and Interpretable LLMs with Mixture of Decoders [PDF] [Copy] [Kimi] [REL]

Author: Grigorios Chrysos

As large language models (LLMs) scale, ensuring interpretability and privacy becomes critical. This talk addresses these interconnected challenges with novel approaches to model specialization and safety. First, we tackle the dense, distributed nature of LLM representations by casting Mixture-of-Experts (MoE) as a tensor decomposition, enabling specialized experts in a factorized space. Second, we argue that current neuron-level sparsity methods create a severe accuracy-sparsity trade-off, and we propose a paradigm shift to layer-level sparsity with the Mixture of Decoders (MxD). We explain how MxD uses tensor factorization to expand dense layers into thousands of specialized, full-rank sublayers, demonstrating how it significantly outperforms alternatives in preserving model faithfulness and performance across LLMs up to 3B parameters. Finally, we address privacy in open-weight models by proposing a scalable and certifiable algorithm that induces maximal uncertainty on protected instances, proving tight bounds that characterize the resulting privacy-utility tradeoff.

Subject: AAAI.2026 - New Faculty Highlights

#4 Bridging Public Health with Clinical Decisions from a Data Centric Perspective [PDF] [Copy] [Kimi] [REL]

Author: Jiaming Cui

Public health and clinical decisions are intertwined. Public health crises place a high burden on healthcare facilities, forcing them to make decisions such as maintaining quality verses treating more people. Meanwhile, sub-optimal clinical decisions also cause downstream effects on communities. For ex- ample, discharging patients too early may increase disease spread. Motivated by this, we bring a data-centric perspective to bridge clinical decisions within the context of infectious diseases for public health. This work addresses multiple challenges arising from effectively utilizing rich clinical datasets and issues stemming from the complexity of disease spread dynamics in healthcare facilities. We will cover methods developed to address these challenges with better designed models to optimize disease surveillance and control policies and new techniques for end-to-end learning with mechanistic models. We will conclude by discussing emerging challenges and opportunities at the intersection of machine learning, scientific modeling, and clinical decision-making for computer scientists, epidemiologists, and computational biologists.

Subject: AAAI.2026 - New Faculty Highlights

#5 Towards Human-centered Proactive Conversational AI [PDF] [Copy] [Kimi] [REL]

Author: Yang Deng

Conversational AI agents are envisioned to provide social support or functional service to human users via natural language interactions. The popularity of conversational AI has grown unprecedentedly with the advent of ChatGPT, which showcases exceptional proficiency in the capabilities of context understanding and response generation with large language models (LLMs). However, typical conversational systems are built to follow instructions, which means that the conversation is led by the user, and the system simply follows the user’s instructions or intents. My research endows the conversational AI with the capabilities of creating or controlling the conversation to achieve the conversational goals by taking initiative and anticipating impacts on themselves or human users, namely Proactive Conversational AI. I will also highlight the importance of moving towards building human-centered proactive conversational AI that emphasize human needs and expectations, and that considers ethical and social implications of these agents, rather than solely focusing on technological capabilities.

Subject: AAAI.2026 - New Faculty Highlights

#6 Unlocking the Power of Large Multimodal Models for Robot Learning: Robustness, Generalization, and Opportunities [PDF] [Copy] [Kimi] [REL]

Author: Mingyu Ding

Large multimodal models (LMMs) have revolutionized AI by demonstrating remarkable capabilities in vision, language, audio, and other domains, particularly in understanding and generalization tasks. Yet, moving beyond passive understanding to active interaction requires embodied agents, such as robots, that can harness the capabilities of AI models to act within the physical world. My core research aims to build embodied agents that reason about and interact with the physical world with human-like commonsense. Specifically, I design algorithms and representations that enable robots to perceive their environment, reason about physical properties, and plan long-horizon actions for both manipulation and locomotion. These advances are grounded in the integration of large-scale AI models with embodied control. I organize this agenda into three stages: (1) injecting actions into LMMs to form vision–language–action (VLA) models; (2) learning from human motion and contact to enrich physical reasoning; and (3) advancing whole-body robot loco-manipulation guided by LMMs toward embodied artificial general intelligence (AGI). The talk details recent advances in leveraging LMMs for robot learning, emphasizing the promise of robust generalization across diverse environments, tasks, and modalities. I will highlight contributions at the intersection of perception, reasoning, and control, and outline open challenges and future opportunities toward enabling humanoid robots that can robustly understand, interact, and collaborate with humans in complex real-world settings.

Subject: AAAI.2026 - New Faculty Highlights

#7 Augmenting Human Creativity with Machine Learning [PDF] [Copy] [Kimi] [REL]

Author: Hao-Wen Dong

In this talk, I will survey my work in three main research directions: 1) generative models for music creation, 2) AI-assisted music creation tools, and 3) multimodal generative models for content creation. In particular, I will discuss our recent work on AI-assisted video editing that explores novel machine learning models that can cut, select, and rearrange a long video into a short video. In the first TeaserGen project, we proposed a narration-centered teaser generation system that can effectively compress >30-min documentaries into <3-min teasers leveraging pretrained LLMs and language-vision models. In the second REGen project, we proposed a retrieval-embedded generation framework that allows an LLM to quote multimodal resources while maintaining a coherent narrative. I will conclude by discussing our future work towards next-generation video editing interfaces using multimodal LLMs and retrieval embedded generation. I will also discuss our future work towards playful human-AI music co-creation systems where the user can control a music generation system through hand gestures and body movements.

Subject: AAAI.2026 - New Faculty Highlights

#8 Teach AI What It Doesn’t Know [PDF] [Copy] [Kimi] [REL]

Author: Xuefeng Du

This talk surveys my research journey toward building reliable machine learning systems that behave safely and predictably in the open world. While modern machine learning models—including foundation models (FMs)—have demonstrated unprecedented capabilities, they often suffer from reliability failures under distribution shift, leading to overconfident mispredictions, hallucinated generations, or susceptibility to adversarial prompts. My research rethinks reliability not as an afterthought, but as a first-class algorithmic principle, to be optimized alongside accuracy with minimal human supervision. The talk is organized around three key threads. To respect the allotted 20-30 minutes, the first and second parts will be briefly discussed. 1. Unknown-Aware Learning via Outlier Synthesis. I introduce a class of learning algorithms that synthesize “virtual outliers” in representation or pixel space to explicitly teach models what they don’t know. This includes the VOS, NPOS, and Dream-OOD frameworks, which shape the energy landscape around in-distribution data to avoid overconfidence on OOD. 2. Learning in the Wild with Unlabeled Data. I present theoretical insights and practical algorithms for leveraging unlabeled in-the-wild data to improve reliability. This includes SAL framework, which uses a gradient-based spectral method to separate potential outliers, and SCONE, which handles semantic and covariate shifts via constrained optimization. These results turn unlabeled data contamination into a learning signal. 3. Reliable Foundation Models. I explore reliability failures in LLMs and multimodal systems. I introduce HaloScope for hallucination detection via subspace separation on LLM representations, and TSV that performs LLM latent steering for improved hallucination detection. I will also briefly cover the LLM security and alignment, which includes VLMGuard for detecting malicious prompts in vision-language models and a data-centric paradigm for AI alignment through source-aware feedback cleaning. Throughout the talk, I highlight how representation learning, data generation, and theoretical guarantees intersect to produce scalable, label-efficient reliability methods. I will also reflect on my broader vision: designing proactive and collaborative AI systems that anticipate uncertainty and support rich human-AI interaction—especially for underrepresented communities and emerging scientific domains. This talk will be accessible to a broad AAAI audience, combining foundational algorithmic insights with real-world applications and forward-looking perspectives on the future of responsible AI.

Subject: AAAI.2026 - New Faculty Highlights

#9 Scaling Human-Centric Trustworthy Foundation Model via Advanced Reasoning and Agentic Frameworks [PDF] [Copy] [Kimi] [REL]

Author: Yi Ren (May) Fung

As foundation models grow in size and scope, crucial challenges remain in scaling their trustworthiness and adaptability to meet the diverse needs of individual users, as well as mitigating their risk of generating unhelpful, non-factual, or harmful content. To address this, we propose to reframe model reasoning through a unified paradigm of active knowledge grounding that coordinates different tools and modalities. First, to scale reasoning depth and creativity, we introduce the novel paradigm of Thinking with Images to encourage models to externalize intermediate structure and perform interleaved cross-modal advanced reasoning beyond text-centric cues. To further scale honesty and bridge knowledge gaps reliably, we develop one of the first vision-language deep research agents, WebWatcher, that actively gathers and verifies information from the web with enhanced fragmented reasoning capability. Ultimately, to scale effective and efficient human-AI collaboration, we propose AdaCtrl as a novel training mechanism for dynamically aligning model behavior with individual user preferences and difficulty awareness to adaptively allocate computational resources. Together, these three pillars of integrating advanced multimodal reasoning, autonomous discovery, and adaptive alignment form a foundational framework for advancing the frontier of next generation human-centric trustworthy AI systems.

Subject: AAAI.2026 - New Faculty Highlights

#10 Breaking the Resource Monopoly: LLM Post-Training and Serving with Modest Data and Compute [PDF] [Copy] [Kimi] [REL]

Author: Jiaxin Huang

Frontier large language models are increasingly powerful though many of them are trained from vast proprietary data and intensive computes, raising barriers for academic labs and smaller institutions for exploration and improvement. In this talk, I will present a unified research agenda for breaking the resource monopoly in both post-training and serving. On the training side, I will describe label-free and even zero-data post-training pipelines that let models curate their own reasoning supervision. On the serving side, I will show how cost-aware inference can enable adaptive test-time scaling to be more efficient. Together, these components form a practical LLM system using modest data and compute resources.

Subject: AAAI.2026 - New Faculty Highlights

#11 All-Purpose Mean Estimation over R [PDF] [Copy] [Kimi] [REL]

Author: Jasper C.H. Lee

Given society's increasing reliance on data, its collection and processing into useful information is a technical problem of growing focus, and perhaps paradoxically, a critical bottleneck in many data science and machine learning applications. Yet, even for the most basic statistical problems such as mean estimation, there is a theory-practice divide. Conventional methods like the sample mean, while supported by theoretical results under strong assumptions, are often brittle in the presence of extreme data. Practitioners thus often use ad-hoc and unprincipled "outlier removal" heuristics, but which can lead to wrong conclusions (e.g. Milikan's underestimation of the electron charge). In this talk, I will describe my work that essentially resolves the fundamental 1-d mean estimation problem. I will show the construction of a statistically-optimal and computationally-efficient 1-dimensional mean estimator, whose estimation error is optimal even in the leading multiplicative constant, under bare minimum distributional assumptions (FOCS 2021). Furthermore, I will discuss its various robustness properties (ICML 2025 Oral), in particular highlighting robustness to adversarial sample corruption.

Subject: AAAI.2026 - New Faculty Highlights

#12 Learning from Imperfect Data: Incremental Learning and Few-shot Learning [PDF] [Copy] [Kimi] [REL]

Author: Yaoyao Liu

In recent years, artificial intelligence (AI) has achieved great success in many fields. Although impressive advances have been made, AI algorithms still suffer from an important limitation: they rely on static and large-scale datasets. In contrast, human beings naturally possess the ability to learn novel knowledge from real-world imperfect data, such as a small number of samples or a non-static continual data stream. Attaining such an ability is particularly appealing and will push the AI models one step further toward human-level Intelligence. In this talk, I will present my work on addressing these challenges in the context of incremental learning and few-shot learning. Specifically, I will first discuss how to get better exemplars for incremental learning based on optimization. I parameterize exemplars and optimize them in an end-to-end manner to obtain high-quality, memory-efficient exemplars. Then, I will present my work on how to apply incremental learning techniques to a more challenging and realistic scenario, e.g., object detection and medical imaging. Lastly, I will briefly mention my work on addressing other challenges and discuss future research directions.

Subject: AAAI.2026 - New Faculty Highlights

#13 Towards Aligned and Efficient Large Language Models [PDF] [Copy] [Kimi] [REL]

Author: Yu Meng

Large language models (LLMs) have rapidly transformed the landscape of AI, demonstrating remarkable capabilities across reasoning, communication, and problem-solving. Yet, realizing their full potential requires addressing two critical challenges. First, their behavior must be steered and refined after training to ensure reliability, safety, and alignment with human values and intentions. Second, their large scale comes with substantial costs in training and deployment, necessitating research into more efficient methods. My research centers on advancing both of these fronts—making LLMs both aligned and efficient. On one side, I investigate post-training techniques that allow models to better reflect human preferences, demonstrate strong reasoning capabilities, and mitigate hallucination. On the other side, I study methods for improving data efficiency in training and inference efficiency in deployment. Together, these thrusts highlight a broader vision of enabling LLMs that are not only powerful, but also trustworthy and accessible at scale.

Subject: AAAI.2026 - New Faculty Highlights

#14 Cross-Modal Knowledge Transfer in Time Series AI via Large Vision Models [PDF] [Copy] [Kimi] [REL]

Author: Jingchao Ni

Time series analysis has progressed from traditional autoregressive models to deep learning, Transformers, and foundation models (FMs), including large language models (LLMs) and large vision models (LVMs). These advances expand model design possibilities and enable time series problem-solving across multiple modalities. This talk will provide an overview of recent developments in large FMs for time series, highlighting frameworks for transferring knowledge from other modalities to time series, and identifying the advantages of LVMs over LLMs in cross-modal knowledge transfer. I will then delve into our recent research on LVMs for time series, discussing (1) mainstream techniques for imaging time series; (2) key strengths and limitations of LVMs in time series modeling; and (3) multimodal frameworks that integrate LVMs for time series encoding. This talk will conclude with an application of LVMs to brain time series analysis in neuroscience. The aim of the talk is to review state-of-the-art (SOTA) AI techniques for time series, highlight unique challenges, and share our recent findings in this promising area.

Subject: AAAI.2026 - New Faculty Highlights

#15 Toward Trustworthy AI for Decision Making in Population Health [PDF] [Copy] [Kimi] [REL]

Author: Alexander Rodríguez

AI and population health are becoming increasingly intertwined, driven by the growing availability of multimodal data and rapid advances in AI. At the AAAI-26 New Faculty Highlights, I present our efforts to harness these trends to enhance our capacity to model, simulate, and adapt to complex dynamical processes. I first introduce our robust deep learning architectures for real-time outbreak response, highlighting how our frameworks capture uncertainty and dynamics across shifting distributions, multimodal data, hierarchical structures, and relational dependencies. I will then introduce our hybrid approaches that integrate machine learning with science-based mechanistic epidemiological models, including physics-informed neural networks, expert-guided generative models for causal inference, and differentiable agent-based models. Together, these advances illustrate how combining data-driven AI with domain knowledge can enable more reliable, adaptive, and actionable solutions to inform decision making in population health.

Subject: AAAI.2026 - New Faculty Highlights

#16 Data-Efficient and Contact-Rich Manipulation Through Diffusion Augmentation and Vision-Language Models [PDF] [Copy] [Kimi] [REL]

Author: Daniel Seita

Recent progress in robot learning has produced impressive results, yet many systems still require learning from large datasets of demonstrations and are less effective in clutter or with highly deformable objects. This talk presents work on data-efficient manipulation using (i) diffusion-based augmentation that synthesizes geometrically consistent images and action labels to reduce demonstration requirements and (ii) Vision-Language Models (VLMs) that inject high-level semantics for contact-rich motion planning in clutter. We will also introduce ManipBench, which evaluates VLMs’ abilities for low-level manipulation. Together, we show how to move the community towards achieving robot manipulators that can learn and operate with reduced demonstration requirements across cluttered and real-world environments.

Subject: AAAI.2026 - New Faculty Highlights

#17 Graph-based Label-Efficient Learning: When Graph-Structured Data Meets Limited Labels [PDF] [Copy] [Kimi] [REL]

Author: Zixing Song

The success of deep learning is highly dependent on large-scale labeled data. This presents a formidable challenge in fields like molecular design and materials science, where data annotation is prohibitively expensive. Consequently, developing label-efficient learning methods to maximize model performance under limited annotation budgets has recently become more and more critical. However, most of the current mainstream label-efficient algorithms, like active learning and semi-supervised learning, are primarily designed for Euclidean data, such as images. They cannot effectively process the non-Euclidean graph-structured data, thus overlooking the rich topological information embedded within. In this talk, we aim to bridge this gap through a progressive research path that addresses three core challenges in data annotation for graph-structured data. First, to address the high cost of annotation, we adapt active learning and semi-supervised learning from general domains to explicit graph data, enabling the precise labeling of high-value nodes. Second, to address label scarcity, we pioneer methods to construct and leverage implicit graph structures, propagating existing labels and generating new information to boost the performance of semi-supervised and self-supervised learning. Finally, to address label noise, we perform the fusion of both explicit and implicit graphs. By learning an implicit structure from noisy explicit graph data, our methods will identify and mitigate the impact of noise.

Subject: AAAI.2026 - New Faculty Highlights

#18 KOALA: Knowledge of Optimization and Learning Algorithms for Healthcare [PDF] [Copy] [Kimi] [REL]

Author: Kai Wang

The Knowledge of Optimization And Learning Algorithms (KOALA) group studies how to integrate optimization, machine learning, and generative modeling to enable data-driven decision-making under uncertainty. We study decision-focused learning, embedding optimization as a differentiable layer to train models end-to-end for decision quality. We design scalable reinforcement learning algorithms for population and personalized healthcare, and develop efficient bilevel optimization methods for nested and multi-agent decision-making. These directions form a unified framework linking optimization and learning for impactful AI in healthcare. Through collaborations with hospitals and NGOs, our group designs and deploys algorithms for pediatric, diabetes, maternal, and mental health applications. Looking ahead, we aim to unite these foundations with generative AI to build theoretically grounded and socially responsible algorithms that advance trustworthy, real-world AI for health and beyond.

Subject: AAAI.2026 - New Faculty Highlights

#19 From Few-Shot Learning to Data-Efficient Intelligence [PDF] [Copy] [Kimi] [REL]

Author: Yaqing Wang

Modern artificial intelligence performs impressively in data-rich settings but still struggles to learn and adapt from only a few examples—a capability central to human intelligence. My research seeks to understand and enable data-efficient generalization, unifying principles across few-shot learning, meta-learning, in-context learning in large language models (LLMs), and adaptive agent behavior. First, I revisit few-shot learning from a foundational perspective, showing why conventional supervised learning breaks down under sparse data and how prior knowledge enables reliable adaptation. I then discuss how these principles extend to real-world scenarios such as scientific discovery and cold-start recommendation, where data are scarce, costly, or dynamically evolving. Finally, I explore how LLMs perform in-context learning and how their adaptive behaviors connect to meta-learning mechanisms. Building on these insights, I develop data-efficient, preference-adaptive agents that quickly align to user needs with minimal interaction.This talk presents a cohesive view of data-efficient intelligence and outlines future directions toward more reliable, human-like learning systems.

Subject: AAAI.2026 - New Faculty Highlights

#20 From Representation to Reasoning: Toward General-Purpose Visual Intelligence [PDF] [Copy] [Kimi] [REL]

Author: Chen Wei

This talk surveys my research agenda on advancing general-purpose visual intelligence, moving AI beyond static recognition toward active reasoning and embodied action. A central challenge is enabling AI systems to generalize reliably in low-data and long-tail regimes. I address this by combining multimodal representation learning with agentic reasoning frameworks such as PyVision, which equips vision models to dynamically generate tools for deliberate problem-solving, and ViGaL, which leverages gameplay to instill transferable cognitive skills for reasoning under scarcity. These efforts chart a trajectory from representation and generation to interactive, embodied agents, re-imagining AI as an active collaborator capable of tool use, imagination, and purposeful engagement across both digital and physical environments.

Subject: AAAI.2026 - New Faculty Highlights

#21 Safe Reinforcement Learning for Trustworthy AI: Theory, Algorithms, and Applications [PDF] [Copy] [Kimi] [REL]

Author: Honghao Wei

Safe reinforcement learning (RL) has emerged as a key paradigm for deploying AI in high-stakes domains such as autonomous driving, robotics, healthcare, and recommender systems. By embedding constraints into the learning process, safe RL enables agents to optimize performance while satisfying critical requirements, including collision avoidance, resource limits, and system reliability. Such guarantees are indispensable for real-world AI, where failures can cause physical harm, economic loss, or loss of trust. At the same time, demand for trustworthy AI continues to grow as machine learning is increasingly deployed in human-centered applications. This makes it essential to design RL algorithms that are not only efficient but also reliable, robust, and aligned with societal needs.

Subject: AAAI.2026 - New Faculty Highlights

#22 Efficient Model Specialization via Training-time and Test-time Adaptation [PDF] [Copy] [Kimi] [REL]

Author: Huanrui Yang

In this talk, we discuss efficient model specialization algorithm to adapt the pretrained model towards downstream tasks while improving its efficiency, efficiently generalizing to multiple tasks via dynamic architectures, and improving inference-time efficiency utilizing the diversity within model block functionalities. These research directions serve as the foundation towards co-designing models, tasks, systems, and hardware for a reconfigurable efficient intelligence future.

Subject: AAAI.2026 - New Faculty Highlights

#23 Toward Causal Foundation World Models: From Representation to Decision-Making [PDF] [Copy] [Kimi] [REL]

Author: Mengyue Yang

My research lies at the intersection of causality, reinforcement learning, world models, and multi-agent systems. I aim to develop causal foundation world models that enable agents to interpret the past, reason about the future, and act reliably in dynamic, non-stationary, and open-ended environments. My work spans causal representation learning (e.g., CausalVAE), causal reasoning in large language models, and causality-driven exploration in open-ended worlds. These contributions have appeared in leading venues such as NeurIPS, ICML, ICLR, CVPR, and KDD, and have been recognized through over 770 citations and the Rising Star in AI award (2024). Looking forward, my agenda focuses on scalable, trustworthy causal world models for healthcare, robotics, scientific discovery, and digital systems.

Subject: AAAI.2026 - New Faculty Highlights

#24 Deep Model Reuse: Paving the Way for Efficient and Generalizable AI Systems [PDF] [Copy] [Kimi] [REL]

Author: Xingyi Yang

Humans easily apply learned skills to different situations, a flexibility that AI systems still struggle to achieve. Current AI models are often confined to their training setup, leading to isolated developments and a narrow scope of application. This largely restricts the creation of flexible and general-purpose AI systems. Deep Model Reuse presents a novel solution. Imagine tapping into a vast library of pre-trained models, each a master in its specialized domain. Our approach re-purposes these existing models, extracting and transforming their knowledge for the development of novel AI systems. In this talk, we explore the essential techniques of this transformative process, highlighting the shift towards versatile and efficient AI that mirrors human cognition's adaptability. We introduce three foundational pillars of deep model reuse: understanding, composing, and refining. First, we investigate the internal behavior of neural networks—using language models as explainers and analyzing the representation space of diffusion models—to uncover how and what models have learned. Second, we develop methods to transform and compose models through weight mapping, knowledge distillation, and model dissection, enabling the creation of new capabilities by reassembling existing expertise. Third, we enhance reliability by editing model behaviors and mitigating biases, ensuring robustness in complex and dynamic environments. We demonstrate the power of this paradigm in generative AI, where model reuse leads to efficient diffusion models free from spectral bias, improved compositional understanding in video generation, and the repurposing of 2D/3D models for 3D/4D content creation. By shifting from training from scratch to intelligently reusing and recombining models, we move closer to adaptive, scalable, and human-like AI systems—ushering in a new era of sustainable and general intelligence.

Subject: AAAI.2026 - New Faculty Highlights

#25 Towards Continually-Evolving AI: Selective and Expandable Multimodal Memory System [PDF] [Copy] [Kimi] [REL]

Author: Jaehong Yoon

Our world is constantly evolving, and human beings continuously enhance their knowledge by learning from their experiences throughout life. Despite significant advancements in embodied AI, current agents struggle to operate reliably, robustly, and continually in complex, real-world multimodal environments due to complex problems that span large and diverse domains, especially when faced with out-of-distribution (OOD) scenarios they have not previously encountered. Our goal is to develop a multimodal, embodied AI system that continually enhances its capabilities and skills through safe and robust interactions with an ever-changing, multimodal world. This is enabled by a novel, adaptively expandable memory architecture that integrates both long- and short-term information across multiple modalities. The system selectively decides what to store and learn, filtering out adversarial or low-quality inputs to prevent negative transfer and distractions, while improving overall efficiency and effectiveness.

Subject: AAAI.2026 - New Faculty Highlights