AAAI.2026 - Special Track on AI for Social Impact | Cool Papers

#1 Bi-Level Contextual Bandits for Individualized Resource Allocation Under Delayed Feedback [PDF] [Copy] [Kimi] [REL]

Authors: Mohammadsina Almasi, Hadis Anahideh

Equitably allocating limited resources in high-stakes domains—such as education, employment, and healthcare—requires balancing short-term utility with long-term impact, while accounting for delayed outcomes, hidden heterogeneity, and ethical constraints. However, most learning-based allocation frameworks either assume immediate feedback or ignore the complex interplay between individual characteristics and intervention dynamics. We propose a novel bi-level contextual bandit framework for individualized resource allocation under delayed feedback, designed to operate in real-world settings with dynamic populations, capacity constraints, and time-sensitive impact. At the meta level, the model optimizes subgroup-level budget allocations to satisfy fairness and operational constraints. At the base level, it identifies the most responsive individuals within each group using a neural network trained on observational data, while respecting cooldown windows and delayed treatment effects modeled via resource-specific delay kernels. By explicitly modeling temporal dynamics and feedback delays, the algorithm continually refines its policy as new data arrive, enabling more responsive and adaptive decision-making. We validate our approach on two real-world datasets from education and workforce development, showing that it achieves higher cumulative outcomes, better adapts to delay structures, and ensures equitable distribution across subgroups. Our results highlight the potential of delay-aware, data-driven decision-making systems to improve institutional policy and social welfare.

Subject: AAAI.2026 - Special Track on AI for Social Impact

#2 Urban Incident Prediction with Graph Neural Networks: Integrating Government Ratings and Crowdsourced Reports [PDF] [Copy] [Kimi] [REL]

Authors: Sidhika Balachandar, Shuvom Sadhuka, Bonnie Berger, Emma Pierson, Nikhil Garg

Graph neural networks (GNNs) are widely used in urban spatiotemporal forecasting, e.g., predicting infrastructure problems. In this setting, government officials aim to identify in which neighborhoods incidents like potholes or rodents occur. The true state of incidents is observed via government inspection ratings. However, these ratings are only conducted for a sparse set of neighborhoods and incident types. We also observe the state of incidents via crowdsourced reports, which are more densely observed but may be biased due to heterogeneous reporting. First, we propose a multiview, multioutput GNN-based model that uses both unbiased rating data and biased reporting data to predict the true latent state of incidents. Second, we investigate a case study of New York City urban incidents and collect a dataset of 9,615,863 crowdsourced reports and 1,041,415 government inspection ratings over 3 years and across 139 types of incidents. We show on both real and semi-synthetic data that our model can better predict the latent state compared to models that use only reporting data or only rating data. Finally, we quantify demographic biases in crowdsourced reporting, e.g., higher-income neighborhoods report problems at higher rates. Our analysis showcases a widely applicable approach for latent state prediction using heterogeneous, sparse, and biased data.

Subject: AAAI.2026 - Special Track on AI for Social Impact

#3 Harnessing Diffusion-Generated Synthetic Images for Fair Image Classification [PDF] [Copy] [Kimi] [REL]

Authors: Abhipsa Basu, Aviral Gupta, Abhijnya Bhat, Venkatesh Babu Radhakrishnan

Image classification systems often inherit biases from uneven group representation in training data. For example, in face datasets for hair color classification, blond hair may be disproportionately associated with females, reinforcing stereotypes. A recent approach leverages the Stable Diffusion model to generate balanced training data, but these models often struggle to preserve the original data distribution. In this work, we explore multiple diffusion-finetuning techniques, e.g., LoRA and DreamBooth, to generate images that more accurately represent each training group by learning directly from their samples. Additionally, in order to prevent a single DreamBooth model from being overwhelmed by excessive intra-group variations, we explore a technique of clustering images within each group and train a DreamBooth model per cluster. These models are then used to generate group-balanced data for pretraining, followed by fine-tuning on real data. Experiments on multiple benchmarks demonstrate that the studied finetuning approaches outperform vanilla Stable Diffusion on average and achieve results comparable to SOTA debiasing techniques like Group-DRO, while surpassing them as the dataset bias severity increases.

Subject: AAAI.2026 - Special Track on AI for Social Impact

#4 Bootstrapping Personalized Insulin Therapy via Model-Based Reinforcement Learning: An In Silico Study [PDF] [Copy] [Kimi] [REL]

Authors: Sumana Basu, Flemming Kondrup, Adriana Romero-Soriano, Doina Precup

Personalized insulin therapy for individuals with Type 1 Diabetes via closed‑loop artificial pancreas systems requires rapid adaptation of dosing strategies to each patient's unique insulin response. However, learning patient‑specific policies from scratch demands extensive exploration, which is often impractical. In this work, we study a framework that integrates insulin-response-informed transfer learning with model-based reinforcement learning for insulin dosing. We first train an LSTM‑based insulin responsiveness predictor on virtual patients, using their glucose, insulin, and meal history to forecast future glucose levels. Analysis of insulin responsiveness of in-silico patients uncovers natural insulin‑response groups characterized by similar sensitivity and dynamics profiles. For a new patient, we identify a representative model from their response group and use it to generate synthetic trajectories. These trajectories are integrated into an enhanced H-step Deep Dyna-Q algorithm, enabling accelerated policy optimization through model-based planning. The dynamics model trained entirely in simulation achieves 91.31% accuracy in predicting blood glucose ranges on the Ohio Type 1 Diabetes dataset, indicating strong zero-shot generalization. Additionally, we find that bootstrapping a new patient with a physiologically-matched reference model accelerates convergence of effective dosing policies across in-silico cohorts of children, adolescents, and adults. These findings suggest that leveraging response-group-specific synthetic experience can expedite personalized insulin therapy, offering a promising pathway towards clinical validation.

Subject: AAAI.2026 - Special Track on AI for Social Impact

#5 An External Fairness Evaluation of LinkedIn Talent Search [PDF] [Copy] [Kimi] [REL]

Authors: Tina Behzad, Siddartha Devic, Vatsal Sharan, Aleksandra Korolova, David Kempe

We conduct an independent, third-party audit for bias of LinkedIn's Talent Search ranking system, focusing on potential ranking bias across two attributes: gender and race. To do so, we first construct a dataset of rankings produced by the system, collecting extensive Talent Search results across a diverse set of occupational queries. We then develop a robust labeling pipeline that infers the two demographic attributes of interest for the returned users. To evaluate potential biases in the collected dataset of real-world rankings, we utilize two exposure disparity metrics: deviation from group proportions and MinSkew@k. Our analysis reveals an under-representation of minority groups in early ranks across many queries. We further examine potential causes of this disparity, and discuss why they may be difficult or, in some cases, impossible to fully eliminate among the early ranks of queries. Beyond static metrics, we also investigate the concept of subgroup fairness over time, highlighting \emph{temporal disparities} in exposure and retention, which are often more difficult to audit for in practice. In employer recruiting platforms such as LinkedIn Talent Search, the persistence of a particular candidate over multiple days in the ranking can directly impact the probability that the given candidate is selected for opportunities. Our analysis reveals demographic disparities in this temporal stability, with some groups experiencing greater volatility in their ranked positions than others. We contextualize all our findings alongside LinkedIn’s published self-audits of its Talent Search system and reflect on the methodological constraints of a black-box external evaluation, including limited observability and noisy demographic inference. Our work contributes empirical insights and practical guidance for conducting third-party audits of modern socio-technical systems which go beyond the well-studied and standard algorithmic fairness guarantees of predictors.

Subject: AAAI.2026 - Special Track on AI for Social Impact

#6 Mapping on a Budget: Optimizing Spatial Data Collection for ML [PDF] [Copy] [Kimi] [REL]

Authors: Livia Betti, Farooq Sanni, Gnouyaro Z. Sogoyou, Togbe Agbagla, Cullen Molitor, Tamma Carleton, Esther Rolf

In applications across agriculture, ecology, and human development, machine learning with satellite imagery (SatML) is limited by the sparsity of labeled training data. While satellite data cover the globe, labeled training datasets for SatML are often small, spatially clustered, and collected for other purposes (e.g., administrative surveys or field measurements). Despite the pervasiveness of this issue in practice, past SatML research has largely focused on new model architectures and training algorithms to handle scarce training data, rather than modeling data conditions directly. This leaves scientists and policymakers who wish to use SatML for large-scale monitoring uncertain about whether and how to collect additional data to maximize performance. Here, we present the first problem formulation for the optimization of spatial training data in the presence of heterogeneous data collection costs and realistic budget constraints, as well as novel methods for addressing this problem. In experiments simulating different problem settings across three continents and four tasks, our strategies reveal substantial gains from sample optimization. Further experiments delineate settings for which optimized sampling is particularly effective. The problem formulation and methods we introduce are designed to generalize across application domains for SatML; we put special emphasis on a specific problem setting where our coauthors can immediately use our findings to augment clustered agricultural surveys for SatML monitoring in Togo.

Subject: AAAI.2026 - Special Track on AI for Social Impact

#7 Self-Regulating Cars: Automating Traffic Control in Free Flow Road Networks [PDF] [Copy] [Kimi] [REL]

Authors: Ankit Bhardwaj, Rohail Asim, Sachin Kumar Chauhan, Yasir Zaki, Lakshmi Subramanian

Free-flow road networks, such as suburban highways, are increasingly experiencing traffic congestion due to growing commuter inflow and limited infrastructure. Traditional control mechanisms—traffic signals or local heuristics—are ineffective or infeasible in these high-speed, signal-free environments. We introduce self-regulating cars, a reinforcement learning-based traffic control protocol that dynamically modulates vehicle speeds to optimize throughput and prevent congestion, without requiring new physical infrastructure. Our approach integrates classical traffic flow theory, gap acceptance models, and microscopic simulation into a physics-informed RL framework. By abstracting roads into super-segments, the agent captures emergent flow dynamics and learns robust speed modulation policies from instantaneous traffic observations. Evaluated in the high-fidelity PTV Vissim simulator on a real-world highway network, our method improves total throughput by 5%, reduces average delay by 13%, and decreases total stops by 3% compared to the no-control setting. It also achieves smoother, congestion-resistant flow while generalizing across varied traffic patterns—demonstrating its potential for scalable, ML-driven traffic management.

Subject: AAAI.2026 - Special Track on AI for Social Impact

#8 On the Feasibility of Using MultiModal LLMs to Execute AR Social Engineering Attacks [PDF] [Copy] [Kimi] [REL]

Authors: Ting Bi, Chenghang Ye, Zheyu Yang, Ziyi Zhou, Cui Tang, Zui Tao, Jun Zhang, Kailong Wang, Liting Zhou, Yang Yang, Tianlong Yu

Augmented Reality (AR) and Multimodal Large Language Models (LLMs) are rapidly evolving, providing unprecedented capabilities for human-computer interaction. However, their integration introduces a new attack surface for Social Engineering (SE). In this paper, we systematically investigate the feasibility of orchestrating AR-driven Social Engineering attacks using Multimodal LLM for the first time, via our proposed SEAR framework, which operates through three key phases: (1) AR-based social context synthesis, which fuses Multimodal inputs (visual, auditory and environmental cues); (2) role-based Multimodal RAG (Retrieval-Augmented Generation), which dynamically retrieves and integrates social context; and (3) ReInteract social engineering agents, which execute adaptive multiphase attack strategies through inference interaction loops. To verify SEAR, we conducted an IRB-approved study with 60 participants and build a novel dataset of 180 annotated conversations in different social scenarios (e.g., coffee shops, networking events). Our results show that SEAR is highly effective at eliciting high-risk behaviors (e.g., 93.3% of participants susceptible to email phishing). The framework was particularly effective in building trust, with 85% of targets willing to accept an attacker's call after an interaction. Also, we identified notable limitations such as authenticity gaps. This work provides proof-of-concept for AR-LLM driven social engineering attacks and insights for developing defenses against next-generation AR/LLM-based SE threats.

Subject: AAAI.2026 - Special Track on AI for Social Impact

#9 Can LLMs Identify Tax Abuse? [PDF] [Copy] [Kimi] [REL]

Authors: Andrew Blair-Stanek, Nils Holzenberger, Benjamin Van Durme

We investigate whether large language models can discover and analyze U.S. tax-minimization strategies. This real-world domain challenges even seasoned human experts, and progress can reduce tax revenue lost from well-advised, wealthy taxpayers. We evaluate the most advanced LLMs on their ability to (1) interpret and verify tax strategies, (2) fill in gaps in partially specified strategies, and (3) generate complete, end-to-end strategies from scratch. This domain should be of particular interest to the LLM reasoning community: unlike synthetic challenge problems or scientific reasoning tasks, U.S. tax law involves navigating hundreds of thousands of pages of statutes, case law, and administrative guidance, all updated regularly. Notably, an LLM identified an apparently novel tax strategy, highlighting these models' potential to revolutionize tax agencies' fight against tax abuse.

Subject: AAAI.2026 - Special Track on AI for Social Impact

#10 A Theoretical Model for Grit in Pursuing Ambitious Ends [PDF] [Copy] [Kimi] [REL]

Authors: Avrim Blum, Emily Diana, Kavya Ravichandran, Alexander Williams Tolbert

Ambition and risk-taking have been heralded as important ways for marginalized communities to get out of cycles of poverty. As a result, educational messaging often encourages individuals to strengthen their personal resolve and develop characteristics such as discipline and grit to succeed in ambitious ends. However, recent work in philosophy and sociology highlights that this messaging often does more harm than good for students in these situations. We study similar questions using a different epistemic approach and in simple theoretical models -- we provide a quantitative model of decision-making between stable and risky choices in the improving multi-armed bandits framework. We use this model to first study how individuals' "strategies" are affected by their level of grittiness and how this affects their accrued rewards. Then, we study the impact of various interventions, such as increasing grit or providing a financial safety net. Our investigation of rational decision making studies the competitive ratio between the accrued reward and the optimal reward.

Subject: AAAI.2026 - Special Track on AI for Social Impact

#11 Democratizing Writing Support with AI: Insights from One Year of Real-World Interactions with an Open-Access Writing Feedback Tool [PDF] [Copy] [Kimi] [REL]

Authors: Babette Bühler, Ivo Bueno, Enkelejda Kasneci

Writing is a foundational skill for educational, professional, and civic participation, yet access to frequent and timely writing feedback remains deeply unequal. Teachers face significant workload constraints, particularly in large classes, and many learners lack alternative sources of individualized feedback. While large language models (LLMs) offer the opportunity for scalable, adaptive support, little is known about how students engage with such feedback tools in real-world, self-directed settings. We present a large-scale, year-long analysis of 23,650 voluntary interactions with an open-access AI writing feedback system used by students across diverse educational contexts and age groups, conducted in accordance with strict data protection standards. Using a clustering approach, we identify 2,800 iterative revision chains and apply a validated LLM-based multidimensional scoring framework to assess text quality over time. Our findings reveal that students who revised their texts after receiving AI feedback demonstrated statistically significant, albeit modest, improvements across both content and language-related dimensions (overall writing quality: ∆ = 0.067, p < .001, r = .17), with the greatest gains observed among initially low-performing writers. Revision frequency was positively associated with improvement, particularly in higher-order writing skills. However, engagement was uneven, with higher usage among students in academically oriented schools. These results demonstrate both the technical feasibility and social potential of deploying generative AI for educational support at scale, while highlighting the need for inclusive infrastructure, accessible design, and targeted outreach to truly democratize educational benefits.

Subject: AAAI.2026 - Special Track on AI for Social Impact

#12 Driving with Regulation: Trustworthy and Interpretable Decision-Making for Autonomous Driving with Retrieval-Augmented Reasoning [PDF] [Copy] [Kimi] [REL]

Authors: Tianhui Cai, Yifan Liu, Zewei Zhou, Haoxuan Ma, Seth Z. Zhao, Zhiwen Wu, Xu Han, Zhiyu Huang, Jiaqi Ma

Understanding and adhering to traffic regulations is essential for autonomous vehicles to ensure safety and trustworthiness. However, traffic regulations are complex, context-dependent, and differ between regions, posing a major challenge to conventional rule-based decision-making approaches. We present an interpretable, regulation-aware decision-making framework, DriveReg, which enables autonomous vehicles to understand and adhere to region-specific traffic laws and safety guidelines. The framework integrates a Retrieval Augmented Generation (RAG)-based Traffic Regulation Retrieval Agent, which retrieves relevant rules from regulatory documents based on the current situation, and a Large Language Model (LLM)-powered Reasoning Agent that evaluates actions for legal compliance and safety. Our design emphasizes interpretability to enhance transparency and trustworthiness. To support systematic evaluation, we introduce DriveReg Scenarios Dataset, a comprehensive dataset of driving scenarios across Boston, Singapore, and Los Angeles, with both hypothesized text-based cases and real-world driving data, specifically constructed and annotated to evaluate models’ capacity for regulation understanding and reasoning. We validate our framework on the DriveReg Scenarios Dataset and real-world deployment, demonstrating strong performance and robustness across diverse environments.

Subject: AAAI.2026 - Special Track on AI for Social Impact

#13 Measuring Model Performance in the Presence of an Intervention [PDF] [Copy] [Kimi] [REL]

Authors: Winston Chen, Michael W. Sjoding, Jenna Wiens

AI models are often evaluated based on their ability to predict the outcome of interest. However, in many AI for social impact applications, the presence of an intervention that affects the outcome can bias the evaluation. Randomized controlled trials (RCTs) randomly assign interventions, allowing data from the control group to be used for unbiased model evaluation. However, this approach is inefficient because it ignores data from the treatment group. Given the complexity and cost often associated with RCTs, making the most use of the data is essential. Thus, we investigate model evaluation strategies that leverage all data from an RCT. First, we theoretically quantify the estimation bias that arises from naïvely aggregating performance estimates from treatment and control groups and derive the condition under which this bias leads to incorrect model selection. Leveraging these theoretical insights, we propose nuisance parameter weighting (NPW), an unbiased model evaluation approach that reweights data from the treatment group to mimic the distribution of samples that would or would not experience the outcome under no intervention. Using synthetic and real-world datasets, we demonstrate that our proposed evaluation approach consistently yields better model selection than the standard approach, which ignores data from the treatment group, across various intervention effect and sample size settings. Our contribution represents a meaningful step towards more efficient model evaluation in real-world contexts.

Subject: AAAI.2026 - Special Track on AI for Social Impact

#14 Fragile by Design: On the Limits of Adversarial Defenses in Personalized DreamBooth Generation [PDF] [Copy] [Kimi] [REL]

Authors: Zhen Chen, Yi Zhang, Xiangyu Yin, Chengxuan Qin, Xingyu Zhao, Xiaowei Huang, Wenjie Ruan

Personalized AI applications such as DreamBooth enable the generation of customized content from user images, but also raise significant privacy concerns, particularly the risk of facial identity leakage. Recent defense mechanisms like Anti-DreamBooth attempt to mitigate this risk by injecting adversarial perturbations into user photos to prevent successful personalization. However, we identify two critical yet overlooked limitations of these methods. First, the adversarial examples often exhibit perceptible artifacts such as conspicuous patterns or stripes, making them easily detectable as manipulated content. Second, the perturbations are highly fragile, as even a simple, non-learned filter can effectively remove them, thereby restoring the model's ability to memorize and reproduce user identity. To investigate this vulnerability, we propose a novel evaluation framework, AntiDB_Purify, to systematically evaluate existing defenses under realistic purification threats, including both traditional image filters and adversarial purification. Results reveal that none of the current methods maintains their protective effectiveness under such threats. These findings highlight that current defenses offer a false sense of security and underscore the urgent need for more imperceptible and robust protections to safeguard user identity in personalized generation.

Subject: AAAI.2026 - Special Track on AI for Social Impact

#15 Optimizing Health Coverage in Ethiopia: A Learning-augmented Approach and Persistent Proportionality Under an Online Budget [PDF] [Copy] [Kimi] [REL]

Authors: Davin Choo, Yohai Trabelsi, Fentabil Getnet, Samson Warkaye Lamma, Wondesen Nigatu, Kasahun Sime, Lisa Matay, Milind Tambe, Stéphane Verguet

As part of nationwide efforts aligned with the United Nations' Sustainable Development Goal 3 on Universal Health Coverage, Ethiopia's Ministry of Health is strengthening health posts to expand access to essential healthcare services. However, only a fraction of this health system strengthening effort can be implemented each year due to limited budgets and other competing priorities, thus the need for an optimization framework to guide prioritization across the regions of Ethiopia. In this paper, we develop a tool, Health Access Resource Planner (HARP), based on a principled decision-support optimization framework for sequential facility planning that aims to maximize population coverage under budget uncertainty while satisfying region-specific proportionality targets at every time step. We then propose two algorithms: (i) a learning-augmented approach that improves upon expert recommendations at any single-step; and (ii) a greedy algorithm for multi-step planning, both with strong worst-case approximation estimation. In collaboration with the Ethiopian Public Health Institute and Ministry of Health, we demonstrated the empirical efficacy of our method on three regions across various planning scenarios.

Subject: AAAI.2026 - Special Track on AI for Social Impact

#16 ShortageSim: Simulating Drug Shortages Under Information Asymmetry [PDF] [Copy] [Kimi] [REL]

Authors: Mingxuan Cui, Yilan Jiang, Duo Zhou, Cheng Qian, Yuji Zhang, Qiong Wang

Drug shortages pose critical risks to patient care and healthcare systems worldwide, yet the effectiveness of regulatory interventions remains poorly understood due to information asymmetries in pharmaceutical supply chains. We propose ShortageSim, which addresses this challenge by providing the first simulation framework that evaluates the impact of regulatory interventions on competition dynamics under information asymmetry. Using Large Language Model (LLM)-based agents, the framework models the strategic decisions of drug manufacturers and institutional buyers, in response to shortage alerts given by the regulatory agency. Unlike traditional game theory models that assume perfect rationality and complete information, ShortageSim simulates heterogeneous interpretations on regulatory announcements and the resulting decisions. Experiments on self-processed dataset of historical shortage events show that ShortageSim reduces the resolution lag for production disruption cases by up to 84%, achieving closer alignment to real-world trajectories than the zero-shot baseline. Our framework confirms the effect of regulatory alert in addressing shortages and introduces a new method for understanding competition in multi-stage environments under uncertainty. We open-source ShortageSim and a dataset of 2,925 FDA shortage events, providing a novel framework for future research on policy design and testing in supply chains under information asymmetry.

Subject: AAAI.2026 - Special Track on AI for Social Impact

#17 SatSolarCast: A Flexible Framework for Multimodal Solar Irradiance Forecasting via Memory-Alignment Learning [PDF] [Copy] [Kimi] [REL]

Authors: Kuai Dai, Hui Su, Chengxing Zhai, Huiwei Lin, Mingliang Bai

Solar irradiance forecast aims to accurately estimate future solar irradiance based on historical data, playing a vital role in energy production and grid management. While ground-based station measurements provide local accuracy, geostationary satellites offer much broader environmental contexts, such as cloud coverage, which serves as a key factor for accurate forecasting. However, effectively integrating these multimodal observations remains a challenge, with existing methods suffering from inflexibility and high computational costs. To address this problem, we propose SatSolarCast, a flexible and efficient multimodal framework that introduces a memory alignment learning mechanism to integrate geostationary satellite data and historical irradiance observations. By preserving and recalling long-term spatiotemporal patterns from a specialized satellite memory bank, SatSolarCast enables effective guidance for both short- and long-term prediction. Additionally, SatSolarCast offers plug-and-play compatibility and can be incorporated into various forecasting architectures. Extensive experiments across four ground stations demonstrate that SatSolarCast substantially improves forecasting performance compared to prior methods with much lower computational costs.

Subject: AAAI.2026 - Special Track on AI for Social Impact

#18 Efficient Forecasting of Geostationary Infrared Brightness Temperature Sequences: A Benchmark and a Lightweight Model [PDF] [Copy] [Kimi] [REL]

Authors: Kuai Dai, Hui Su, Xutao Li, Chengxing Zhai

Forecasting geostationary infrared brightness temperature sequences from historical observations is a significant and challenging task. By analyzing these predictions, cloud evolution, convective activity, and atmospheric radiative states can be revealed in advance, offering high potential value in domains such as weather nowcasting, energy management, and disaster monitoring. Recently, artificial intelligence techniques have provided valuable insights into this task. However, as a nascent research area, the lack of a standardized, high-quality benchmark has significantly impeded progress. Moreover, training existing deep learning models for this task remains computationally expensive due to the complexity of their network architectures and modeling mechanisms. To address these challenges, we introduce a new benchmark, FY4ABT, and propose a lightweight prediction model, WavePredNet. Specifically, FY4ABT comprises three sub-datasets designed to respectively evaluate prediction performance under short-term, medium-term, and long-term scenarios. Meanwhile, WavePredNet effectively captures multi-scale dynamics, including both low- and high-frequency components with low computational costs while delivering exceptional performance.

Subject: AAAI.2026 - Special Track on AI for Social Impact

#19 RiverScope: High-Resolution River Masking Dataset [PDF] [Copy] [Kimi] [REL]

Authors: Rangel Daroya, Taylor Rowley, Jonathan Acero Flores, Elisa Friedmann, Fiona B Bennitt, Heejin An, Travis Thomas Simmons, Marissa Hughes, Camryn L Kluetmeier, Solomon Kica, J. Daniel Vélez, Sarah E. Esenther, Thomas E. Howard, Yanqi Ye, Audrey J. Turcotte, Colin Gleason, Subhransu Maji

Surface water dynamics play a critical role in Earth’s climate system, influencing ecosystems, agriculture, disaster resilience, and sustainable development. Yet monitoring rivers and surface water at fine spatial and temporal scales remains challenging---especially for narrow or sediment-rich rivers that are poorly captured by low-resolution satellite data. To address this, we introduce RiverScope, a high-resolution dataset developed through collaboration between computer science and hydrology experts. RiverScope comprises 1,145 high-resolution images (covering 2,577 square kilometers) with expert-labeled river and surface water masks, requiring over 100 hours of manual annotation. Each image is co-registered with Sentinel-2, SWOT, and the SWOT River Database (SWORD), enabling the evaluation of cost-accuracy trade-offs across sensors---a key consideration for operational water monitoring. We also establish the first global, high-resolution benchmark for river width estimation, achieving a median error of 7.2 meters---significantly outperforming existing satellite-derived methods. We extensively evaluate deep networks across multiple architectures (e.g., CNNs and transformers), pretraining strategies (e.g., supervised and self-supervised), and training datasets (e.g., ImageNet and satellite imagery). Our best-performing models combine the benefits of transfer learning with the use of all the multispectral PlanetScope channels via learned adaptors. RiverScope provides a valuable resource for fine-scale and multi-sensor hydrological modeling, supporting climate adaptation and sustainable water management.

Subject: AAAI.2026 - Special Track on AI for Social Impact

#20 When Proxy Agents Disagree, Do Humans Mirror? Manipulating Human Behavior in Moral Dilemmas Through Agents [PDF] [Copy] [Kimi] [REL]

Authors: Haotian Deng, Sitian Wang, Ruxin Wang, Chen Wei, Quanying Liu

The diversity across populations and the variability between individuals have long posed a significant challenge in cognitive science. Although large language models (LLMs) have made notable progress in aligning with human values, faithfully capturing the high degree of diversity and uncertainty in human judgment remains an unresolved challenge.This study investigates whether computational models, or `proxy agents," can not only emulate human decision patterns but also systematically modulate them. We propose a framework wherein we first fine-tune BERT-based proxy agents to replicate both aggregate and individual-level human judgments on a large-scale moral dilemma dataset. We then hypothesize that stimuli identified as maximally divisive for these individualized agents will similarly elicit high disagreement among human participants. Through a manipulating experiment, we validate this hypothesis, demonstrating that agent-selected stimuli can predictably induce targeted divergence in human moral choices. Our findings provide empirical evidence that AI agents can bias human perceptual variability by strategically filtering information. We further analyze this induced moral divergence using a Bayesian framework and concept decomposition to identify the distinct conceptual dimensions driving individual differences. This work quantifies the potential for AI-driven cognitive modulation and underscores the urgent need for ethical guidelines to prevent the misuse of such capabilities.

Subject: AAAI.2026 - Special Track on AI for Social Impact

#21 Toward Gaze Target Detection of Young Autistic Children [PDF] [Copy] [Kimi] [REL]

Authors: Shijian Deng, Erin E. Kosloski, Siva Sai Nagender Vasireddy, Jia Li, Randi Sierra Sherwood, Feroz Mohamed Hatha, Siddhi Patel, Pamela R. Rollins, Yapeng Tian

The automatic detection of gaze targets in autistic children through artificial intelligence can be impactful, especially for those who lack access to a sufficient number of professionals to improve their quality of life. This paper introduces a new, real-world AI application for gaze target detection in autistic children, which predicts a child's point of gaze from an activity image. This task is foundational for building automated systems that can measure joint attention—a core challenge in Autism Spectrum Disorder (ASD). To facilitate the study of this challenging application, we collected the first-ever Autism Gaze Target (AGT) Dataset. We further propose a novel social-aware coarse-to-fine (SACF) gaze detection framework that explicitly leverages the social context of a scene to overcome the class imbalance common in autism datasets—a consequence of autistic children's tendency to show reduced gaze to faces. It utilizes a two-pathway architecture with expert models specialized in social and non-social gaze, guided by a context-awareness gate module. The results of our comprehensive experiments demonstrate that our framework achieves new state-of-the-art performance for gaze target detection in this population, significantly outperforming existing methods, especially on the critical minority class of face-directed gaze.

Subject: AAAI.2026 - Special Track on AI for Social Impact

#22 GraphVSSM: Graph Variational State-Space Model for Probabilistic Spatiotemporal Inference of Dynamic Exposure and Vulnerability for Regional Disaster Resilience Assessment [PDF] [Copy] [Kimi] [REL]

Authors: Joshua Dimasaka, Christian Geiß, Emily So

Regional disaster resilience quantifies the changing nature of physical risks to inform policy instruments ranging from local immediate recovery to international sustainable development. While many existing state-of-practice methods have greatly advanced the dynamic mapping of exposure and hazard, our understanding of large-scale physical vulnerability has remained static, costly, limited, region-specific, coarse-grained, overly aggregated, and inadequately calibrated. With the significant growth in the availability of time-series satellite imagery and derived products for exposure and hazard, we focus our work on the equally important yet challenging element of the risk equation: physical vulnerability. Given this unique problem, we leverage machine learning methods that flexibly capture spatial contextual relationships, limited temporal observations, and uncertainty in a unified probabilistic spatiotemporal inference framework. We therefore introduce Graph Variational State-Space Model (GraphVSSM), a novel modular spatiotemporal approach that uniquely integrates graph deep learning, state-space modeling, and variational inference using time-series data and prior expert belief systems in a weakly supervised or coarse-to-fine-grained manner. We present three major results: a city-wide demonstration in Quezon City, Philippines; an investigation of sudden changes in the cyclone-impacted coastal Khurushkul community (Bangladesh) and the mudslide-affected Freetown (Sierra Leone); and an open geospatial dataset, METEOR 2.5D, that spatiotemporally enhances the existing global static dataset for 46 UN-recognized Least Developed Countries (as of 2020). Beyond advancing the practice of regional disaster resilience assessment and improving our understanding of global progress in disaster risk reduction, our method also offers a probabilistic deep learning approach, contributing to broader urban studies that require compositional data analysis in weakly supervised settings.

Subject: AAAI.2026 - Special Track on AI for Social Impact

#23 Capacity Constraints Make Admissions Processes Less Predictable [PDF] [Copy] [Kimi] [REL]

Authors: Evan Dong, Nikhil Garg, Sarah Dean

Machine learning models are often used to make predictions about the outcomes of applications to selective programs. Many prospective school or college applicants turn to machine learning models to predict whether they will be admitted to a program, and employers may use algorithmic tools to filter out resumes predicted to have a low probability of being hired when offering interviews for a job opening. However, such decision processes differ substantially from the conventional machine learning setting: decisions are not independent across applicants. Whether a student is admitted depends on the other applicants who apply because admissions decisions are capacity-constrained. We formalize how the nature of admission decisions results in a data-generating process which is incompatible with traditional machine learning assumptions. We characterize how selection functions properties affect the difficulty of generalization to applicant pool distribution shifts, introducing two concepts: stability, which measures how many existing decisions can change when a single new applicant is introduced; and variability, which measures the number of unique students whose decisions can change. We demonstrate our theory on admissions data from the New York City high school matching system, showing that machine learning performance degrades as the applicant pool increasingly differs from the training data. Furthermore, there are larger performance drops for schools using decision rules that are less stable and more variable. Our work raises questions about the reliability of predicting individual admissions probabilities.

Subject: AAAI.2026 - Special Track on AI for Social Impact

#24 Characterizing AI Manipulation Risks in Brazilian YouTube Climate Discourse [PDF] [Copy] [Kimi] [REL]

Authors: Wenchao Dong, Marcelo Sartori Locatelli, Virgilio Almeida, Meeyoung Cha

Climate change poses a global threat to public health, food security, and economic stability. Addressing it requires evidence-based policies and a nuanced understanding of how the threat is perceived by the public, particularly within visual social media, where narratives quickly evolve through voices of individuals, politicians, NGOs, and institutions. This study investigates climate-related discourse on YouTube within the Brazilian context, a geopolitically significant nation in global environmental negotiations. Through three case studies, we examine (1) which psychological content traits most effectively drive audience engagement, (2) the extent to which these traits influence content popularity, and (3) whether such insights can inform the design of persuasive synthetic campaigns such as climate denialism using recent generative language models. Another contribution of this work is the release of a large publicly available dataset of 226K Brazilian YouTube videos and 2.7M user comments on climate change. The dataset includes fine-grained annotations of persuasive strategies, theory of mind categorizations in user responses, and typologies of content creators. This resource can help support future research on digital climate communication and the ethical risk of algorithmically amplified narratives and generative media.

Subject: AAAI.2026 - Special Track on AI for Social Impact

#25 How Can You Tell if Your Large Language Model Could Be a Closet Antisemite? An Explainability-Based Audit Framework for Implicit Bias [PDF] [Copy] [Kimi] [REL]

Authors: Arka Dutta, Reza Fayyazi, Shanchieh Yang, Ashiqur R. KhudaBukhsh

Auditing large language models (LLMs) for biases is an ongoing and dynamic process, resembling a proverbial cat-and-mouse game. As researchers identify new vulnerabilities in LLMs, guardrails are updated to address them, prompting the need for innovative approaches to audit the increasingly fortified LLMs for biases. This paper makes three contributions. First, it introduces a scalable, explainable framework to measure biases against various identity groups across multiple open large language models. Second, it conducts a bias audit considering five well-known open LLMs and demonstrates their bias inclinations towards several historically disadvantaged groups. Our audit reveals disturbing antisemitic, Islamophobic, and xenophobic biases present in several well-known LLMs. Finally, we release a dataset of 1,000 probes curated under the supervision of an expert social scientist that can facilitate similar audits.

Subject: AAAI.2026 - Special Track on AI for Social Impact