| Total: 29
Interactive theorem provers (ITPs) are computer programs in which axioms and a conjecture are stated in a formal language, and a user provides the ITP with relatively high-level steps of a formal proof for the conjecture. Then, by invoking automated theorem provers, the ITP tries to generate low-level steps that fill the gaps between the steps provided by the user, thus forming a complete formal proof of the conjecture. The ITP also checks the entire formal proof against the axioms, thus confirming the soundness of all derivations in the formal proof. In this talk, I will discuss the existing opportunities and potential benefits to applying ITPs to reason about and verify AI concepts, algorithms, and software. I will also discuss the challenges we have to being able to apply ITPs in AI and reap those benefits. I will do so by discussing a number of my previous projects on the application of ITPs to different AI concepts, algorithms, and software systems. These projects span different areas of planning (classical planning, temporal planning, and planning under uncertainty) as well as algorithms with applications in algorithmic game theory, like general graph matching and online matching.
Planning is the act of deliberative thinking before acting. It is based on a symbolic model of the world and the options to act in it, usually defined in function-free first-order logic. The task is to find a sequence of actions (a plan) that leads from a given current state to a desired goal state. The basic, purely physical description may be augmented with a partially ordered grammar-like structure (a Hierarchical Task Network or HTN), which can describe expert knowledge, or practical, legal, or operational requirements. In this talk, I will survey a variety of methods for automatically deriving plans using symbolic methods for planning -- from both my past and future research. These symbolic methods -- in some sense -- translate planning problems into other, simpler symbolic representations and reason over them to find plans. As a basis for these methods, I will firstly introduce relevant theoretical results on planning. First, I will discuss the expressive power of planning formalisms (ECAI'14, ICAPS'16) and second, the computational complexity of HTN planning and related tasks such as HTN plan verification, plan modification, and plan recognition (ICAPS'15, ICAPS'16). Based on these theoretical results, I will develop why SAT-based HTN planning is possible and how it can be implemented. To this end, I will survey several of my publications at top-tier conferences, including papers at ICAPS'17, AAAI'18, AAAI'19, IJCAI'19, AAAI'20, and ICAPS'21 -- in which I developed an highly SAT-based planner for HTN problems including the ability to find optimal plans as well as the grounding as a preprocessing step. Here I will also give an outlook on future developments and new ideas that I propose for SAT-based planning -- including the exploitation of structures in plan (e.g.\ landmarks or operator-counting constraints). Next, I will present the idea of expressing lifted classical planning as SAT (ICAPS'22). The resulting planner LiSAT was the first lifted SAT-based planner -- and proved highly efficient and outperformed all other lifted planners at the time of publication. Notably, LiSAT was the first planner (lifted or grounded) and still is the only one to solve the challenging OrganicSynthesis benchmark -- and could even prove optimality for all plans. I will also outline future ideas to further improve the efficiency of LiSAT. Lastly, I introduce the notion of planning with symbolic symbolic representations (AAAI'21 and ICAPS'23). Here one uses Binary Decision Diagrams to encode large sets of states efficiently. For expressing the additional structure encoded by HTNs, I show how BDDs can be suitably integrated into finite automata. Based on this representation, an efficient and optimal planning algorithm can be derived. Additionally, I show how this algorithm can be extended to also cover oversubscription planning.
Significant progress in the field of fair machine learning (ML) has been made to counteract algorithmic discrimination against marginalized groups. However, fairness remains an active research area that is far from settled. One key bottleneck is the implicit assumption that environments, where ML is developed and deployed, are certain and reliable. In a world that is characterized by volatility, uncertainty, complexity, and ambiguity, whether what has been developed in algorithmic fairness can still serve its purpose is far from obvious. In this talk, I will first discuss how to improve algorithmic fairness under two kinds of predictive uncertainties, i.e., aleatoric uncertainty (i.e., randomness and ambiguity in the data) and epistemic uncertainty (i.e., a lack of data or knowledge), respectively. The former regards historical bias reflected in the data and the latter corresponds to the bias perpetuated or amplified during model training due to lack of data or knowledge. In particular, the first work studies pushing the fairness-utility trade-off through aleatoric uncertainty, and the second work investigates fair few-shot learning. The last work introduces coverage-based fairness that ensures different groups enjoy identical treatment and receive equal coverage.
My research strives to develop fundamental graph-centric learning algorithms to reduce the need for human supervision in low-resource scenarios. The focus is on achieving effective and reliable data-efficient learning on graphs, which can be summarized into three facets: (1) graph weakly-supervised learning; (2) graph few-shot learning; and (3) graph self-supervised learning.
Neural models, including large language models (LLMs), achieve superior performance on logical reasoning tasks such as question answering. To elicit reasoning capabilities from LLMs, recent works propose using the chain-of-thought (CoT) mechanism to generate both the reasoning chain and the answer, which enhances the model’s capabilities in conducting reasoning. However, due to LLM’s uninterpretable nature and the extreme flexibility of free-form explanations, several challenges remain: such as struggling with inaccurate reasoning, hallucinations, and not aligning with human preferences. In this talk, we will focus on (1) our design of leveraging structured information (that is grounded to the context), for the explainable complex question answering and reasoning; (2) our multi-module interpretable framework for inductive reasoning, which conducts step-wise faithful reasoning with iterative feedback.
Models that learn from data are widely and rapidly being deployed today for real-world use, but they suffer from unforeseen failures due to distribution shift, adversarial attacks, noise and corruption, and data scarcity. But many failures also occur because many modern AI tasks require reasoning beyond pattern matching -- and such reasoning abilities are difficult to formulate as data-based input-output function fitting. The reliability problem has become increasingly important under the new paradigm of semantic ``multimodal'' learning. My research provides avenues to develop robust and reliable computer vision systems, particularly by leveraging the interactions between vision and language. In this AAAI New Faculty highlights talk, I will cover three thematic areas of my research, ranging from robustness in computer vision, open-domain reliability in visual reasoning, and challenges and opportunities in evaluation of generative models. Readers are encouraged to refer to my website (www.tejasgokhale.com) for more details and updates from my lab's activities towards the goal of robust visual understanding.
Building autonomous agents that can process massive amounts of real-time sensor-captured data is essential for many real-world applications including autonomous vehicles, robotics and AI in medicine. As the agent often needs to explore in a dynamic environment, it is thus a desirable as well as challenging goal to enable the agent to learn over time without performance degradation. Continual learning aims to build a continual learner which can learn new concepts over the data stream while preserving previously learnt concepts. In the talk, I will survey three pieces of my recent research on continual learning (i) supervised continual learning, (ii) unsupervised continual learning, and (iii) multi-modal continual learning. In the first work, I will discuss a supervised continual learning algorithm called MEGA which dynamically balances the old tasks and the new task. In the second work, I will discuss unsupervised continual learning algorithms which learn representation continually without access to the labels. In the third work, I will elaborate an efficient continual learning algorithm that can learn multiple modalities continually without forgetting.
A critical challenge for the widescale adoption of reinforcement learning (RL) is the need to give domain experts assurance that learned policies will improve decision-making -- and not lead to unacceptable behavior. To meet this challenge, my work aims to develop new methods for offline policy evaluation in real world RL domains. There has been much recent interest in offline evaluation and many advances. However, recent benchmarking efforts have also shown that there remains a substantial gap between current state-of-the-art methods and real world domains such as robotics. Towards scalable offline evaluation, my group is investigating the use of methods for abstraction and representation learning. In this new faculty highlight, I will present our recent results that show the promise of this direction for scaling offline evaluation in RL domains. I will then describe future directions in this line of that work which will further realize the promise of offline policy evaluation for increasing confidence in deployed RL.
The increasingly decentralized and private nature of data in our digital society has motivated the development of personalized, collaborative intelligent systems that enable knowledge aggregation across multiple data owners while accommodating for their data privacy and system constraints. However, collaborative learning has only been investigated in simple and limited settings: isolated task scenarios where learning begins from scratch and does not build on prior expertise; learned model is represented in task-specific forms which are not generalizable to unseen, emerging scenarios; and more often, a universal model representation is assumed across collaborators, ignoring their local compute constraints or input representations. This restricts its practicality in continual learning scenarios with limited task data, which demand continuous adaptation and knowledge transfer across different information silos, tasks, and learning models, as well as the utilization of prior solution expertises. To overcome these limitations, my research has been focused on developing effective and scalable resource-aware collaborative learning frameworks across heterogeneous systems.
Deep learning has exhibited a number of surprising generalization phenomena that are not captured by classical statistical learning theory. This talk will survey some of my work on the theoretical characterizations of several such intriguing phenomena: (1) Implicit regularization: A major mystery in deep learning is that deep neural networks can often generalize well despite their excessive expressive capacity. Towards explaining this mystery, it has been suggested that commonly used gradient-based optimization algorithms enforce certain implicit regularization which effectively constrains the model capacity. (2) Benign overfitting: In certain scenarios, a model can perfectly fit noisily labeled training data, but still archives near-optimal test error at the same time, which is very different from the classical notion of overfitting. (3) Grokking: In certain scenarios, a model initially achieves perfect training accuracy but no generalization (i.e. no better than a random predictor), and upon further training, transitions to almost perfect generalization. Theoretically establishing these properties often involves making appropriate high-dimensional assumptions on the problem as well as a careful analysis of the training dynamics.
Recent years have seen a surge in research that develops and applies machine learning algorithms to create intelligent learning systems. However, traditional machine learning algorithms have primarily focused on optimizing accuracy and efficiency, and they often fail to consider how to foster trustworthiness in their design. As a result, machine learning models usually face a trust crisis in real-world applications. Driven by these urgent concerns about trustworthiness, in this talk, I will introduce my research efforts towards the goal of making machine learning trustworthy. Specifically, I will delve into the following key research topics: security vulnerabilities and robustness, model explanations, and privacy-preserving mechanisms.
Many learning tasks in Artificial Intelligence (AI) require dealing with graph data, ranging from biology and chemistry to finance and education. As powerful deep learning tools for graphs, graph neural networks (GNNs) have demonstrated remarkable performance in various graph-related applications. Despite the significant accomplishments of GNNs, recent studies have highlighted that their efficiency and effectiveness face significant challenges such as adversarial robustness and scalability, which are fundamentally linked to data. While major attention has been devoted to improving GNNs from the model perspective, the potential of directly enhancing data has often been overlooked. It underscores a critical gap in GNN research---while model improvements are undoubtedly important, we also need to recognize and address the data-related factors contributing to the challenges. Hence, my research is to investigate solutions for these challenges from the data perspective, employing strategies such as data characterization, reduction, augmentation, transformation, and detection.
This talk surveys three related research contributions that shed light on the current US political divide: 1. a novel machine-translation-based framework to quantify political polarization; 2. an analysis of disparate media portrayal of US policing in major cable news outlets; and 3. a novel perspective of vicarious offense that examines a timely and important question -- how well do Democratic-leaning users perceive what content would be deemed as offensive by their Republican-leaning counterparts or vice-versa?
For robots to robustly and flexibly interact with humans, they need to acquire skills to use across scenarios. One way to enable the generalization of skills is to learn representations that are useful for downstream tasks. Learning a representation for interactions requires an understanding of what (e.g., objects) as well as how (e.g., actions, controls, and manners) to interact with. However, most existing language or visual representations mainly focus on objects. To enable robust human-robot interactions, we need a representation that is not just grounded at the object level but to reason at the action level. The ability to reason about an agent’s own actions and other’s actions will be crucial for long-tail interactions. My research focuses on leveraging the compositional nature of language and reward functions to learn representations that generalize to novel scenarios. Together with the information from multiple modalities, the learned representation can reason about task progress, future behaviors, and the goals/beliefs of an agent. The above ideas have been demonstrated in my research on building robots to understand language and engage in social interactions.
The conventional wisdom of simple models in machine learning misses the bigger picture, especially over-parameterized neural networks (NNs), where the number of parameters are much larger than the number of training data. Our goal is to explore the mystery behind over-parameterized models from a theoretical side. In this talk, I will discuss the role of over-parameterization in neural networks, to theoretically understand why they can perform well. First, I will discuss the role of over-parameterization in neural networks from the perspective of models, to theoretically understand why they can genralize well. Second, the effects of over-parameterization in robustness, privacy are discussed. Third, I will talk about the over-parameterization from kernel methods to neural networks in a function space theory view. Besides, from classical statistical learning to sequential decision making, I will talk about the benefits of over-parameterization on how deep reinforcement learning works well for function approximation. Potential future directions on theory of over-parameterization ML will also be discussed.
The current analysis of federated optimization algorithms for training deep neural networks assumes that the data is non-sequential (e.g., images), which incurs a smooth loss objective. In contrast, edge devices generate lots of sequential data every day, where these sequences exhibit significant sequential correlation at different time stamps (e.g., text messages). In order to learn from such sequential data, people typically use a class of neural networks that is inherently nonsmooth, with a potentially unbounded smoothness parameter. Examples include recurrent neural networks, long-short-term memory networks, and transformers. It remains unclear how to design provably efficient algorithms for training these neural networks to learn from sequential data. My goal is to lay the algorithmic foundation of federated learning with sequential data, which contributes novel algorithms for learning from a range of real-world sequential data (e.g., natural language, electronic health record, transportation, time series, etc.) using state-of-the-art deep neural networks. In this talk, I will first motivate the problem by showing that the transformer, which is widely used for sequential data learning, has an unbounded smooth landscape. Then, I will introduce provably efficient federated deep learning algorithms in the presence of unbounded smoothness. In particular, I will introduce a few efficient algorithms for various settings of federated learning, including homogeneous data, heterogeneous data, and partial client participation. The main result is twofold. First, we show that the designed algorithms provably small computational and communication complexities. Second, we establish fundamental hardness results in the unbounded smoothness setting. Ultimately, I will discuss the future challenges of extending our research framework from small-scale neural networks to large language models.
Graphs (i.e., networks) are ubiquitous in daily life, as they can effectively model a plethora of real-world systems with connected units, such as social networks and biological networks. Recent years have witnessed rapid development in graph-based machine learning (GML) in various high-impact domains. Currently, the mainstream GML methods are based on statistical learning, e.g., utilizing the statistical correlations between node features, graph structure, and labels for node classification. However, statistical learning has been widely criticized for only capturing the superficial relations between variables in the data system, and consequently, rendering the lack of trustworthiness in real-world applications. Therefore, it is crucial to understand the causality in the data system and the learning process. Causal inference is the discipline that investigates the causality inside a system, for example, to identify and estimate the causal effect of a certain treatment (e.g., wearing a face mask) on an important outcome (e.g., COVID-19 infection). Involving the concepts and philosophy of causal inference in ML methods is often considered significant for human-level intelligence and can serve as the foundation of artificial intelligence (AI). However, most traditional causal inference studies rely on strong assumptions, and focus on independent and identically distributed (i.i.d.) data, while causal inference on graphs is faced with many barriers. Therefore, we aim to bridge the gap between causal inference and GML.
Language acquisition and utilization transcend the mere exchange of lexical units. Visual cues, prosody, gestures, body movements, and context play an undeniably crucial role. Humans naturally communicate multimodally, employing multiple channels and synthesizing information from diverse modalities. My research delves into the characterization and construction of multimodal models that seamlessly integrate data from multiple independent modalities. I will cover recent work that highlights the challenges, achievements, and opportunities towards developing capable multimodal discursive models.
The integration of learning and reasoning is one of the key challenges in artificial intelligence and machine learning today. The area of Neuro-Symbolic AI (NeSy) tackles this challenge by integrating symbolic reasoning with neural networks. In our recent work, we provided an introduction to NeSy by drawing several parallels to another field that has a rich tradition in integrating learning and reasoning, namely Statistical Relational Artificial Intelligence (StarAI).
The integration of advances from machine learning and computer vision with the classical autonomy stack has brought successful robot deployments in fulfilment, manufacturing, and transportation. However, unstructured and dynamic environments such as pedestrian spaces and streets, workplaces, and homes pose additional challenges such as modeling human behavior, understanding user perceptions, and ensuring human safety and comfort. My work addresses such challenges to enable robots to fluently work with and around people to increase productivity and assist users.
Inverse reinforcement learning (IRL) has seen significant advancements in recent years. This class of approaches aims to efficiently learn the underlying reward function that rationalizes the behavior exhibited by expert agents, often represented by humans. In contrast to mere behavioral cloning, the reconstruction of a reward function yields appealing implications, as it allows for more effective interpretability of the expert’s decisions and provides a transferable specification of the expert’s objectives for application in even different environments. Unlike the well-understood field of reinforcement learning (RL) from a theoretical perspective, IRL still grapples with limited understanding, significantly constraining its applicability. A fundamental challenge in IRL is the inherent ambiguity in selecting a reward function, given the existence of multiple candidate functions, all explaining the expert’s behavior. In this talk, I will survey three of my papers that have made notable contributions to the IRL field: “Provably Efficient Learning of Transferable Rewards”, “Towards Theoretical Understanding of Inverse Reinforcement Learning”, and “Inverse Reinforcement Learning with Sub-optimal Experts". The central innovation introduced by the first paper is a novel formulation of the IRL problem that overcomes the issue of ambiguity. IRL is reframed as the problem of learning the feasible reward set, which is the set of all rewards that can explain the expert’s behavior. This approach postpones the selection of the reward function, thereby circumventing the ambiguity issues. Furthermore, the feasible reward set exhibits convenient geometric properties that enable the development of efficient algorithms for its computation. Building on this novel formulation of IRL, the second paper addresses the problem of efficiently learning the feasible reward set when the environment and the expert’s policy are not known in advance. It introduces a novel way to assess the dissimilarity between feasible reward sets based on the Hausdorff distance and presents a new PAC (probabilistic approximately correct) framework. The most significant contribution of this paper is the introduction of the first sample complexity lower bound, which highlights the challenges inherent in the IRL problem. Deriving this lower bound necessitated the development of novel technical tools. The paper also demonstrates that when a generative model of the environment is available, a uniform sampling strategy achieves a sample complexity that matches the lower bound, up to logarithmic factors. Finally, in the third paper, the IRL problem in the presence of sub-optimal experts is investigated. Specifically, the paper assumes the availability of multiple sub-optimal experts, in addition to the expert agent, which provides additional demonstrations, associated with a known quantification of the maximum amount of sub-optimality. The paper shows that this richer information mitigates the ambiguity problem, significantly reducing the size of the feasible reward set while retaining its favorable geometric properties. Furthermore, the paper explores the associated statistical problem and derives novel lower bounds for sample complexity, along with almost matching algorithms. These selected papers represent notable advancements in IRL, contributing to the establishment of a solid theoretical foundation for IRL and extending the framework to accommodate scenarios with sub-optimal experts.
A key challenge in Machine Learning (ML) is the identification of geometric structure in high-dimensional data. Most algorithms assume that data lives in a high-dimensional vector space; however, many applications involve non-Euclidean data, such as graphs, strings and matrices, or data whose structure is determined by symmetries in the underlying system. Here, we discuss methods for identifying geometric structure in data and how leveraging data geometry can give rise to efficient ML algorithms with provable guarantees.
Deep neural networks (DNNs) have achieved unprecedented success across many scientific and engineering fields in the last decades. Despite its empirical success, unfortunately, recent studies have shown that there are various failure modes and blindspots in DNN models which may result in unexpected serious failures and potential harms, e.g. the existence of adversarial examples and small perturbations. This is not acceptable especially for safety critical and high stakes applications in the real-world, including healthcare, self-driving cars, aircraft control systems, hiring and malware detection protocols. Moreover, it has been challenging to understand why and when DNNs will fail due to their complicated structures and black-box behaviors. Lacking interpretability is one critical issue that may seriously hinder the deployment of DNNs in high-stake applications, which need interpretability to trust the prediction, to understand potential failures, and to be able to mitigate harms and eliminate biases in the model. To make DNNs trustworthy and reliable for deployment, it is necessary and urgent to develop methods and tools that can (i) quantify and improve their robustness against adversarial and natural perturbations, and (ii) understand their underlying behaviors and further correct errors to prevent injuries and damages. These are the important first steps to enable Trustworthy AI and Trustworthy Machine Learning. In this talk, I will survey a series of research efforts in my lab contributed to tackling the grand challenges in (i) and (ii). In the first part of my talk, I will overview our research effort in Robust Machine Learning since 2017, where we have proposed the first attack-agnostic robustness evaluation metric, the first efficient robustness certification algorithms for various types of perturbations, and efficient robust learning algorithms across supervised learning to deep reinforcement learning. In the second part of my talk, I will survey a series of exciting results in my lab on accelerating interpretable machine learning and explainable AI. Specifically, I will show how we could bring interpretability into deep learning by leveraging recent advances in multi-modal models. I'll present recent works in our group on automatically dissecting neural networks with open vocabulary concepts, designing interpretable neural networks without concept labels, and briefly overview our recent efforts on demystifying black-box DNN training process, automated neuron explanations for Large Language Models and the first robustness evaluation of a family of neuron-level interpretation techniques.
The real-world deployment of machine learning algorithms often poses challenges due to shifts in data distributions and tasks. These shifts can lead to a degradation in model performance, as the model may not have encountered such changes during training. Additionally, they can make it difficult for the model to generalize to new scenarios and can result in poor performance in real-world applications. In this talk, I will present our research on building machine learning models that are highly generalizable and easily adaptable to different shifts. Specifically, I will first discuss our approach to improving out-of-distribution robustness and mitigating spurious correlations by training environment-invariant models through selective augmentation and post-hoc rectification. Second, I will present our techniques for continuous and rapid adaptation of models to new tasks and environments. This includes methods to facilitate compositional generalization and adaptation by extracting relationships from historical observations and to enhance reliable adaptation even in the face of imperfect observations. Additionally, I will showcase our successful practices for addressing shifts in real-world applications, such as in the healthcare, e-commerce, and transportation industries. The talk will also touch upon the remaining challenges and outline future research directions in this area.
Relational structured data is a way of representing knowledge using nodes and edges, while also capturing the meaning of that knowledge in a structured form that can be used for machine learning. Compared with vision and natural language data, relational structured data represents and manipulates structured knowledge, which can be beneficial for tasks that involve reasoning or inference. On the other hand, vision and NLP deal more with unstructured data (like images and text), and they often require different types of models and algorithms to extract useful information or features from the data. Human-like Learning develops methods that can harness relational structures and learning-to-learn to rapidly acquire and generalize knowledge to new tasks and situations. With Human-like Learning, the learning algorithm is efficient and can adapt to new or unseen situations, which is crucial in real-world applications where environments may change unpredictably. Moreover, the models are easier for humans to understand and interpret, which is important for transparency and trust in AI systems. In this talk, we present our recent attempts towards human-like learning from relational structured data.