AAAI.2025 - Philosophy and Ethics of AI | Cool Papers

#1 Quantitative Predictive Monitoring and Control for Safe Human-Machine Interaction [PDF²] [Copy] [Kimi²] [REL]

Authors: Shuyang Dong, Meiyi Ma, Josephine Lamp, Sebastian Elbaum, Matthew B. Dwyer, Lu Feng

There is a growing trend toward AI systems interacting with humans to revolutionize a range of application domains such as healthcare and transportation. However, unsafe human-machine interaction can lead to catastrophic failures. We propose a novel approach that predicts future states by accounting for the uncertainty of human interaction, monitors whether predictions satisfy or violate safety requirements, and adapts control actions based on the predictive monitoring results. Specifically, we develop a new quantitative predictive monitor based on Signal Temporal Logic with Uncertainty (STL-U) to compute a robustness degree interval, which indicates the extent to which a sequence of uncertain predictions satisfies or violates an STL-U requirement. We also develop a new loss function to guide the uncertainty calibration of Bayesian deep learning and a new adaptive control method, both of which leverage STL-U quantitative predictive monitoring results. We apply the proposed approach to two case studies: Type 1 Diabetes management and semi-autonomous driving. Experiments show that the proposed approach improves safety and effectiveness in both case studies.

Subject: AAAI.2025 - Philosophy and Ethics of AI

#2 The Unreasonable Effectiveness of Open Science in AI: A Replication Study [PDF³] [Copy] [Kimi] [REL]

Authors: Odd Erik Gundersen, Odd Cappelen, Martin Mølnå, Nicklas Grimstad Nilsen

A reproducibility crisis has been reported in science, but the extent to which it affects AI research is not yet fully understood. Therefore, we performed a systematic replication study including 30 highly cited AI studies relying on original materials when available. In the end, eight articles were rejected because they required access to data or hardware that was practically impossible to acquire as part of the project. Six articles were successfully reproduced, while five were partially reproduced. In total, 50% of the articles included was reproduced to some extent. The availability of code and data correlate strongly with reproducibility, as 86% of articles that shared code and data were fully or partly reproduced, while this was true for 33% of articles that shared only data. The quality of the data documentation correlates with successful replication. Poorly documented or miss-specified data will probably result in unsuccessful replication. Surprisingly, the quality of the code documentation does not correlate with successful replication. Whether the code is poorly documented, partially missing, or not versioned is not important for successful replication, as long as the code is shared. This study emphasizes the effectiveness of open science and the importance of properly documenting data work.

Subject: AAAI.2025 - Philosophy and Ethics of AI

#3 Watch Out for Your Guidance on Generation! Exploring Conditional Backdoor Attacks against Large Language Models [PDF] [Copy] [Kimi] [REL]

Authors: Jiaming He, Wenbo Jiang, Guanyu Hou, Wenshu Fan, Rui Zhang, Hongwei Li

Mainstream backdoor attacks on large language models (LLMs) typically set a fixed trigger in the input instance and specific responses for triggered queries. However, the fixed trigger setting (e.g., unusual words) may be easily detected by human detection, limiting the effectiveness and practicality in real-world scenarios. To enhance the stealthiness of backdoor activation, we present a new poisoning paradigm against LLMs triggered by specifying generation conditions, which are commonly adopted strategies by users during model inference. The poisoned model performs normally for output under normal/other generation conditions, while becomes harmful for output under target generation conditions. To achieve this objective, we introduce BrieFool, an efficient attack framework. It leverages the characteristics of generation conditions by efficient instruction sampling and poisoning data generation, thereby influencing the behavior of LLMs under target conditions. Our attack can be generally divided into two types with different targets: Safety unalignment attack and Ability degradation attack. Our extensive experiments demonstrate that BrieFool is effective across safety domains and ability domains, achieving higher success rates than baseline methods, with 94.3% on GPT-3.5-turbo.

Subject: AAAI.2025 - Philosophy and Ethics of AI

#4 SongSong: A Time Phonograph for Chinese SongCi Music from Thousand of Years Away [PDF] [Copy] [Kimi] [REL]

Authors: Jiliang Hu, Jiajia Li, Ziyi Pan, Chong Chen, Zuchao Li, Ping Wang, Lefei Zhang

Recently, there have been significant advancements in music generation. However, existing models primarily focus on creating modern pop songs, making it challenging to produce ancient music with distinct rhythms and styles, such as ancient Chinese SongCi. In this paper, we introduce SongSong, the first music generation model capable of restoring Chinese SongCi to our knowledge. Our model first predicts the melody from the input SongCi, then separately generates the singing voice and accompaniment based on that melody, and finally combines all elements to create the final piece of music. Additionally, to address the lack of ancient music datasets, we create OpenSongSong, a comprehensive dataset of ancient Chinese SongCi music, featuring 29.9 hours of compositions by various renowned SongCi music masters. To assess SongSong's proficiency in performing SongCi, we randomly select 85 SongCi sentences that were not part of the training set for evaluation against SongSong and music generation platforms such as Suno and SkyMusic. The subjective and objective outcomes indicate that our proposed model achieves leading performance in generating high-quality SongCi music.

Subject: AAAI.2025 - Philosophy and Ethics of AI

#5 Perception-Guided Jailbreak Against Text-to-Image Models [PDF²] [Copy] [Kimi] [REL]

Authors: Yihao Huang, Le Liang, Tianlin Li, Xiaojun Jia, Run Wang, Weikai Miao, Geguang Pu, Yang Liu

In recent years, Text-to-Image (T2I) models have garnered significant attention due to their remarkable advancements. However, security concerns have emerged due to their potential to generate inappropriate or Not-Safe-For-Work (NSFW) images. In this paper, inspired by the observation that texts with different semantics can lead to similar human perceptions, we propose an LLM-driven perception-guided jailbreak method, termed PGJ. It is a black-box jailbreak method that requires no specific T2I model (model-free) and generates highly natural attack prompts. Specifically, we propose identifying a safe phrase that is similar in human perception yet inconsistent in text semantics with the target unsafe word and using it as a substitution. The experiments conducted on six open-source models and commercial online services with thousands of prompts have verified the effectiveness of PGJ.

Subject: AAAI.2025 - Philosophy and Ethics of AI

#6 Learning to Rewind via Iterative Prediction of Past Weights for Practical Unlearning [PDF] [Copy] [Kimi] [REL]

Authors: Jinhyeok Jang, Jaehong Kim, Chan-Hyun Youn

In artificial intelligence (AI), many legal conflicts have arisen, especially concerning privacy and copyright associated with training data. When an AI model's training data incurs privacy concerns, it becomes imperative to develop a new model devoid of influences from such contentious data. However, retraining from scratch is often not viable due to the extensive data requirements and heavy computational costs. Machine unlearning presents a promising solution by enabling the selective erasure of specific knowledge from models. Despite its potential, many existing approaches in machine unlearning are based on scenarios that are either impractical or could lead to unintended degradation of model performance. We utilize the concept of weight prediction to approximate the less-learned weights based on observations about further training. By repetition of 1) finetuning on specific data and 2) weight prediction, our work gradually eliminates knowledge about the specific data. We verify its ability to eliminate side effects caused by problematic data and show its effectiveness across various architectures, datasets, and tasks.

Subject: AAAI.2025 - Philosophy and Ethics of AI

#7 Fair Text-to-Image Diffusion via Fair Mapping [PDF²] [Copy] [Kimi] [REL]

Authors: Jia Li, Lijie Hu, Jingfeng Zhang, Tianhang Zheng, Hua Zhang, Di Wang

In this paper, we address the limitations of existing text-to-image diffusion models in generating demographically fair results when given human-related descriptions. These models often struggle to disentangle the target language context from sociocultural biases, resulting in biased image generation. To overcome this challenge, we propose Fair Mapping, a flexible, model-agnostic, and lightweight approach that modifies a pre-trained text-to-image diffusion model by controlling the prompt to achieve fair image generation. One key advantage of our approach is its high efficiency. It only requires updating an additional linear network with few parameters at a low computational cost. By developing a linear network that maps conditioning embeddings into a debiased space, we enable the generation of relatively balanced demographic results based on the specified text condition. With comprehensive experiments on face image generation, we show that our method significantly improves image generation fairness with almost the same image quality compared to conventional diffusion models when prompted with descriptions related to humans. By effectively addressing the issue of implicit language bias, our method produces more fair and diverse image outputs.

Subject: AAAI.2025 - Philosophy and Ethics of AI

#8 Practicable Black-Box Evasion Attacks on Link Prediction in Dynamic Graphs—a Graph Sequential Embedding Method [PDF²] [Copy] [Kimi] [REL]

Authors: Jiate Li, Meng Pang, Binghui Wang

Link prediction in dynamic graphs (LPDG) has been widely applied to real-world applications such as website recommendation, traffic flow prediction, organizational studies, etc. These models are usually kept local and secure, with only the interactive interface restrictively available to the public. Thus, the problem of the black-box evasion attack on the LPDG model, where model interactions and data perturbations are restricted, seems to be essential and meaningful in practice. In this paper, we propose the first practicable black-box evasion attack method that achieves effective attacks against the target LPDG model, within a limited amount of interactions and perturbations. To perform effective attacks under limited perturbations, we develop a graph sequential embedding model to find the desired state embedding of the dynamic graph sequences, under a deep reinforcement learning framework. To overcome the scarcity of interactions, we design a multi-environment training pipeline and train our agent for multiple instances, by sharing an aggregate interaction buffer. Finally, we evaluate our attack against three advanced LPDG models on three real-world graph datasets of different scales and compare its performance with related methods under the interaction and perturbation constraints. Experimental results show that our attack is both effective and practicable.

Subject: AAAI.2025 - Philosophy and Ethics of AI

#9 ScreenMark: Watermarking Arbitrary Visual Content on Screen [PDF] [Copy] [Kimi¹] [REL]

Authors: Xiujian Liang, Gaozhi Liu, Yichao Si, Xiaoxiao Hu, Zhenxing Qian

Digital watermarking has shown its effectiveness in protecting multimedia content. However, existing watermarking is predominantly tailored for specific media types, rendering them less effective for the protection of content displayed on computer screens, which is often multi-modal and dynamic. Visual Screen Content (VSC), is particularly susceptible to theft and leakage through screenshots, a vulnerability that current watermarking methods fail to adequately address. To address these challenges, we propose ScreenMark, a robust and practical watermarking method designed specifically for arbitrary VSC protection. ScreenMark utilizes a three-stage progressive watermarking framework. Initially, inspired by diffusion principles, we initialize the mutual transformation between regular watermark information and irregular watermark patterns. Subsequently, these patterns are integrated with screen content using a pre-multiplication alpha blending technique, supported by a pre-trained screen decoder for accurate watermark retrieval. The progressively complex distorter enhances the robustness of the watermark in real-world screenshot scenarios. Finally, the model undergoes fine-tuning guided by a joint-level distorter to ensure optimal performance. To validate the effectiveness of ScreenMark, we compiled a dataset comprising 100,000 screenshots from various devices and resolutions. Extensive experiments on different datasets confirm the superior robustness, imperceptibility, and practical applicability of the method.

Subject: AAAI.2025 - Philosophy and Ethics of AI

#10 Transfer Learning of Real Image Features with Soft Contrastive Loss for Fake Image Detection [PDF] [Copy] [Kimi] [REL]

Authors: Ziyou Liang, Weifeng Liu, Run Wang, Mengjie Wu, Boheng Li, Yuyang Zhang, Lina Wang, Xinyi Yang

In the last few years, the artifact patterns in fake images synthesized by different generative models have been inconsistent, leading to the failure of previous research that relied on spotting subtle differences between real and fake. In our preliminary experiments, we find that the artifacts in fake images always change with the development of the generative model, while natural images exhibit stable statistical properties. In this paper, we employ natural traces shared only by real images as an additional target for a classifier. Specifically, we introduce a self-supervised feature mapping process for natural trace extraction and develop a transfer learning based on soft contrastive loss to bring them closer to real images and further away from fake ones. This motivates the detector to make decisions based on the proximity of images to the natural traces. To conduct a comprehensive experiment, we built a high-quality and diverse dataset that includes generative models comprising GANs and diffusion models, to evaluate the effectiveness in generalizing unknown forgery techniques and robustness in surviving different transformations. Experimental results show that our proposed method gives 96.2% mAP significantly outperforms the baselines. Extensive experiments conducted on the widely recognized platform Midjourney reveal that our proposed method achieves an accuracy exceeding 78.4%, underscoring its practicality for real-world application deployment.

Subject: AAAI.2025 - Philosophy and Ethics of AI

#11 Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering [PDF] [Copy] [Kimi] [REL]

Authors: Youngsun Lim, Hojun Choi, Hyunjung Shim

Despite the impressive success of text-to-image (TTI) models, existing studies overlook the issue of whether these models accurately convey factual information. In this paper, we focus on the problem of image hallucination, where images created by TTI models fail to faithfully depict factual content. To address this, we introduce I-HallA (Image Hallucination evaluation with Question Answering), a novel automated evaluation metric that measures the factuality of generated images through visual question answering (VQA). We also introduce I-HallA v1.0, a curated benchmark dataset for this purpose. As part of this process, we develop a pipeline that generates high-quality question-answer pairs using multiple GPT-4 Omni-based agents, with human judgments to ensure accuracy. Our evaluation protocols measure image hallucination by testing if images from existing TTI models can correctly respond to these questions. The I-HallA v1.0 dataset comprises 1.2K diverse image-text pairs across nine categories with 1,000 rigorously curated questions covering various compositional challenges. We evaluate five TTI models using I-HallA and reveal that these state-of-the-art models often fail to accurately convey factual information. Moreover, we validate the reliability of our metric by demonstrating a strong Spearman correlation (ρ=0.95) with human judgments. Our benchmark dataset and metric can serve as a foundation for developing factually accurate TTI models.

Subject: AAAI.2025 - Philosophy and Ethics of AI

#12 Fusing Pruned and Backdoored Models: Optimal Transport-based Data-free Backdoor Mitigation [PDF] [Copy] [Kimi] [REL]

Authors: Weilin Lin, Li Liu, Jianze Li, Hui Xiong

Backdoor attacks present a serious security threat to deep neuron networks (DNNs). Although numerous effective defense techniques have been proposed in recent years, they inevitably rely on the availability of either clean or poisoned data. In contrast, data-free defense techniques have evolved slowly and still lag significantly in performance. To address this issue, different from the traditional approach of pruning followed by fine-tuning, we propose a novel data-free defense method named Optimal Transport-based Backdoor Repairing (OTBR) in this work. This method, based on our findings on neuron weight changes (NWCs) of random unlearning, uses optimal transport (OT)-based model fusion to combine the advantages of both pruned and backdoored models. Specifically, we first demonstrate our findings that the NWCs of random unlearning are positively correlated with those of poison unlearning. Based on this observation, we propose a random-unlearning NWC pruning technique to eliminate the backdoor effect and obtain a backdoor-free pruned model. Then, motivated by the OT-based model fusion, we propose the pruned-to-backdoored OT-based fusion technique, which fuses pruned and backdoored models to combine the advantages of both, resulting in a model that demonstrates high clean accuracy and a low attack success rate. To our knowledge, this is the first work to apply OT and model fusion techniques to backdoor defense. Extensive experiments show that our method successfully defends against all seven backdoor attacks across three benchmark datasets, outperforming both state-of-the-art (SOTA) data-free and data-dependent methods.

Subject: AAAI.2025 - Philosophy and Ethics of AI

#13 Mjölnir: Breaking the Shield of Perturbation-Protected Gradients via Adaptive Diffusion [PDF] [Copy] [Kimi] [REL]

Authors: Xuan Liu, Siqi Cai, Qihua Zhou, Song Guo, Ruibin Li, Kaiwei Lin

Perturbation-based mechanisms, such as differential privacy, mitigate gradient leakage attacks by introducing noise into the gradients, thereby preventing attackers from reconstructing clients' private data from the leaked gradients. However, can gradient perturbation protection mechanisms truly defend against all gradient leakage attacks? In this paper, we present the first attempt to break the shield of gradient perturbation protection in Federated Learning for the extraction of private information. We focus on common noise distributions, specifically Gaussian and Laplace, and apply our approach to DNN and CNN models. We introduce Mjölnir, a perturbation-resilient gradient leakage attack that is capable of removing perturbations from gradients without requiring additional access to the original model structure or external data. Specifically, we leverage the inherent diffusion properties of gradient perturbation protection to develop a novel diffusion-based gradient denoising model for Mjölnir. By constructing a surrogate client model that captures the structure of perturbed gradients, we obtain crucial gradient data for training the diffusion model. We further utilize the insight that monitoring disturbance levels during the reverse diffusion process can enhance gradient denoising capabilities, allowing Mjölnir to generate gradients that closely approximate the original, unperturbed versions through adaptive sampling steps. Extensive experiments demonstrate that Mjölnir effectively recovers the protected gradients and exposes the Federated Learning process to the threat of gradient leakage, achieving superior performance in gradient denoising and private data recovery.

Subject: AAAI.2025 - Philosophy and Ethics of AI

#14 DP-MemArc: Differential Privacy Transfer Learning for Memory Efficient Language Models [PDF] [Copy] [Kimi] [REL]

Authors: Yanming Liu, Xinyue Peng, Yuwei Zhang, Xiaolan Ke, Songhang Deng, Jiannan Cao, Chen Ma, Mengchen Fu, Xuhong Zhang, Sheng Cheng, Xun Wang, Jianwei Yin, Tianyu Du

Large language models have repeatedly shown outstanding performance across diverse applications. However, deploying these models can inadvertently risk user privacy. The significant memory demands during training pose a major challenge in terms of resource consumption. This substantial size places a heavy load on memory resources, raising considerable practical concerns. In this paper, we introduce DP-MemArc, a novel training framework aimed at reducing the memory costs of large language models while emphasizing the protection of user data privacy. DP-MemArc incorporates side network or reversible network designs to support a variety of differential privacy memory-efficient fine-tuning schemes. Our approach not only achieves about 2.5 times in memory optimization but also ensures robust privacy protection, keeping user data secure and confidential. Extensive experiments have demonstrated that DP-MemArc effectively provides differential privacy-efficient fine-tuning across different task scenarios.

Subject: AAAI.2025 - Philosophy and Ethics of AI

#15 Interpretable Failure Detection with Human-Level Concepts [PDF] [Copy] [Kimi] [REL]

Authors: Kien X. Nguyen, Tang Li, Xi Peng

Reliable failure detection holds paramount importance in safety-critical applications. Yet, neural networks are known to produce overconfident predictions for misclassified samples. As a result, it remains a problematic matter as existing confidence score functions rely on category-level signals, the logits, to detect failures. This research introduces an innovative strategy, leveraging human-level concepts for a dual purpose: to reliably detect when a model fails and to transparently interpret why. By integrating a nuanced array of signals for each category, our method enables a finer-grained assessment of the model's confidence. We present a simple yet highly effective approach based on the ordinal ranking of concept activation to the input image. Without bells and whistles, our method is able to significantly reduce the false positive rate across diverse real-world image classification benchmarks, specifically by 3.7% on ImageNet and 9.0% on EuroSAT.

Subject: AAAI.2025 - Philosophy and Ethics of AI

#16 RepeatLeakage: Leak Prompts from Repeating as Large Language Model Is a Good Repeater [PDF¹] [Copy] [Kimi] [REL]

Authors: Yu Peng, Lijie Zhang, Peizhuo Lv, Kai Chen

With the development of large language models (LLMs), numerous online applications based on these models have emerged. As system prompts significantly influence the performance of LLMs, many such applications conceal their system prompts and regard them as intellectual property. Consequently, numerous efforts have been made to steal these system prompts. However, for applications that do not publicly disclose their system prompts, previously stolen prompts have low confidence. This is because previous methods rely on confirmation from application developers, which is unrealistic since developers may be unwilling to acknowledge that their system prompts have been leaked. We observed a phenomenon: when an LLM performs repetitive tasks, it accurately repeats based on the context rather than relying on its internal model parameters. We validated this phenomenon by comparing the results of two different inputs—repetitive tasks and knowledge-based tasks—under conditions of normal execution, contaminated execution, and partially restored execution. By contaminating the input nouns and then partially restoring them using data from the normal execution's intermediate layers, we measured the accuracies of both task types across these three execution processes. Based on this phenomenon, we propose a high-confidence leakage method called RepeatLeakage. By specifying the range that the model needs to repeat and encouraging the model not to change the format, we manage to extract its system prompt and conversation contexts. We validated the repetition phenomenon on multiple open-source models and successfully designed prompts using RepeatLeakage to leak contents from the actual system prompts of GPT-Store and publicly available ChatGPT conversation contexts. Finally, we tested RepeatLeakage in real environments such as ChatGPT web, successfully leaking their system prompts and conversation contexts.

Subject: AAAI.2025 - Philosophy and Ethics of AI

#17 Fairness-Accuracy Trade-Offs: A Causal Perspective [PDF¹] [Copy] [Kimi] [REL]

Authors: Drago Plecko, Elias Bareinboim

With the widespread adoption of AI systems, many of the decisions once made by humans are now delegated to automated systems. Recent works in the literature demonstrate that these automated systems, when used in socially sensitive domains, may exhibit discriminatory behavior based on sensitive characteristics such as gender, sex, religion, or race. In light of this, various notions of fairness and methods to quantify discrimination have been proposed, also leading to the development of numerous approaches for constructing fair predictors. At the same time, imposing fairness constraints may decrease the utility of the decision-maker, highlighting a tension between fairness and utility. This tension is also recognized in legal frameworks, for instance in the disparate impact doctrine of Title VII of the Civil Rights Act of 1964 -- in which specific attention is given to considerations of \textit{business necessity} -- possibly allowing the usage of proxy variables associated with the sensitive attribute in case a high-enough utility cannot be achieved without them. In this work, we analyze the tension between fairness and accuracy from a causal lens for the first time. We introduce the notion of a path-specific excess loss (PSEL) that captures how much the predictor's loss increases when a causal fairness constraint is enforced. We then show that the total excess loss (TEL), defined as the difference between the loss of predictor fair along all causal pathways vs. an unconstrained predictor, can be decomposed into a sum of more local PSELs. At the same time, enforcing a causal constraint often reduces the disparity between demographic groups. Thus, we introduce a quantity that summarizes the fairness-utility trade-off, called the causal fairness/utility ratio, defined as the ratio of the reduction in discrimination vs. the excess in the loss from constraining a causal pathway. This quantity is particularly suitable for comparing the fairness-utility trade-off across different causal pathways. Finally, as our approach requires causally-constrained fair predictors, we introduce a new neural approach for causally-constrained fair learning. Our approach is evaluated across multiple real-world datasets, providing new insights into the tension between fairness and accuracy.

Subject: AAAI.2025 - Philosophy and Ethics of AI

#18 Provably Secure Image Robust Steganography via Cross-modal Error Correction [PDF¹] [Copy] [Kimi] [REL]

Authors: Yuang Qi, Kejiang Chen, Na Zhao, Zijin Yang, Weiming Zhang

The rapid development of image generation models has facilitated the widespread dissemination of generated images on social networks, creating favorable conditions for provably secure image steganography. However, existing methods face issues such as low quality of generated images and lack of semantic control in the generation process. To leverage provably secure steganography with more effective and high-performance image generation models, and to ensure that stego images can accurately extract secret messages even after being uploaded to social networks and subjected to lossy processing such as JPEG compression, we propose a high-quality, provably secure, and robust image steganography method based on state-of-the-art autoregressive (AR) image generation models using Vector-Quantized (VQ) tokenizers. Additionally, we employ a cross-modal error-correction framework that generates stego text from stego images to aid in restoring lossy images, ultimately enabling the extraction of secret messages embedded within the images. Extensive experiments have demonstrated that the proposed method provides advantages in stego quality, embedding capacity, and robustness, while ensuring provable undetectability.

Subject: AAAI.2025 - Philosophy and Ethics of AI

#19 Medical Manifestation-Aware De-Identification [PDF²] [Copy] [Kimi] [REL]

Authors: Yuan Tian, Shuo Wang, Guangtao Zhai

Face de-identification (DeID) has been widely studied for common scenes, but remains under-researched for medical scenes, mostly due to the lack of large-scale patient face datasets. In this paper, we release MeMa, consisting of over 40,000 photo-realistic patient faces. MeMa is re-generated from massive real patient photos. By carefully modulating the generation and data-filtering procedures, MeMa avoids breaching real patient privacy, while ensuring rich and plausible medical manifestations. We recruit expert clinicians to annotate MeMa with both coarse- and fine-grained labels, building the first medical-scene DeID benchmark. Additionally, we propose a baseline approach for this new medical-aware DeID task, by integrating data-driven medical semantic priors into the DeID procedure. Despite its conciseness and simplicity, our approach substantially outperforms previous ones.

Subject: AAAI.2025 - Philosophy and Ethics of AI

#20 FROC: Building Fair ROC from a Trained Classifier [PDF] [Copy] [Kimi] [REL]

Authors: Avyukta Manjunatha Vummintala, Shantanu Das, Sujit Gujar

This paper considers the problem of fair probabilistic binary classification with binary protected groups. The classifier assigns scores, and a practitioner predicts labels using a certain cut-off threshold based on the desired trade-off between false positives vs. false negatives. It derives these thresholds from the ROC of the classifier. The resultant classifier may be unfair to one of the two protected groups in the dataset. It is desirable that no matter what threshold the practitioner uses, the classifier should be fair to both the protected groups; that is, the ℒₚ norm between FPRs and TPRs of both the protected groups should be at most ε. We call such fairness on ROCs of both the protected attributes εₚ-Equalized ROC. Given a classifier not satisfying ε₁-Equalized ROC, we aim to design a post-processing method to transform the given (potentially unfair) classifier's output (score) to a suitable randomized yet fair classifier. That is, the resultant classifier must satisfy ε₁-Equalized ROC. First, we introduce a threshold query model on the ROC curves for each protected group. The resulting classifier is bound to face a reduction in AUC. With the proposed query model, we provide a rigorous theoretical analysis of the minimal AUC loss to achieve ε₁-Equalized ROC. To achieve this, we design a linear time algorithm, namely FROC, to transform a given classifier's output to a probabilistic classifier that satisfies ε₁-Equalized ROC. We prove that under certain theoretical conditions, FROC achieves the theoretical optimal guarantees. We also study the performance of our FROC on multiple real-world datasets with many trained classifiers.

Subject: AAAI.2025 - Philosophy and Ethics of AI

#21 Operationalising Rawlsian Ethics for Fairness in Norm Learning Agents [PDF] [Copy] [Kimi] [REL]

Authors: Jessica Woodgate, Paul Marshall, Nirav Ajmeri

Social norms are standards of behaviour common in a society. However, when agents make decisions without considering how others are impacted, norms can emerge that lead to the subjugation of certain agents. We present RAWL·E, a method to create ethical norm-learning agents. RAWL·E agents operationalise maximin, a fairness principle from Rawlsian ethics, in their decision-making processes to promote ethical norms by balancing societal well-being with individual goals. We evaluate RAWL·E agents in simulated harvesting scenarios. We find that norms emerging in RAWL·E agent societies enhance social welfare, fairness, and robustness, and yield higher minimum experience compared to those that emerge in agent societies that do not implement Rawlsian ethics.

Subject: AAAI.2025 - Philosophy and Ethics of AI

#22 FairTP: A Prolonged Fairness Framework for Traffic Prediction [PDF] [Copy] [Kimi] [REL]

Authors: Jiangnan Xia, Yu Yang, Jiaxing Shen, Senzhang Wang, Jiannong Cao

Traffic prediction is pivotal in intelligent transportation systems. Existing works focus mainly on improving overall accuracy, overlooking a crucial problem of whether prediction results will lead to biased decisions by transportation authorities. In practice, the uneven deployment of traffic sensors in different urban areas produces imbalanced data, making the traffic prediction model fail in some urban areas and leading to unfair regional decision-making that eventually severely affects equity and quality of residents’ life. Existing fairness machine learning models struggle to maintain fair traffic prediction over prolonged periods. Although these models might achieve fairness at certain time slots, this static fairness will break down as traffic conditions change. To fill this research gap, we investigate prolonged fair traffic prediction, introducing two novel fairness metrics, i.e., region-based static fairness and sensor-based dynamic fairness, tailored to fairness fluctuations over time and across areas. An innovative prolonged fairness traffic prediction framework, namely FairTP, is then proposed. FairTP achieves prolonged fairness by alternating between “sacrifice” and “benefit” the prediction accuracy of each traffic sensor or area, ensuring that the number of these two actions are balanced over time. Specifically, FairTP incorporates a state identification module to discriminate whether the traffic sensors or areas are in a “sacrifice” or “benefit” state, thereby enabling prolonged fairness-aware traffic predictions. Additionally, we devise a state-guided balanced sampling strategy to select training examples to further enhance prediction fairness by mitigating the performance disparities among areas with uneven sensor distribution over time. Extensive experiments in two real-world datasets show that FairTP significantly improves prediction fairness without causing significant accuracy degradation.

Subject: AAAI.2025 - Philosophy and Ethics of AI

#23 Measuring Human and AI Values Based on Generative Psychometrics with Large Language Models [PDF¹] [Copy] [Kimi] [REL]

Authors: Haoran Ye, Yuhang Xie, Yuanyi Ren, Hanjun Fang, Xin Zhang, Guojie Song

Human values and their measurement are long-standing interdisciplinary inquiry. Recent advances in AI have sparked renewed interest in this area, with large language models (LLMs) emerging as both tools and subjects of value measurement. This work introduces Generative Psychometrics for Values (GPV), an LLM-based, data-driven value measurement paradigm, theoretically grounded in text-revealed selective perceptions. The core idea is to dynamically parse unstructured texts into perceptions akin to static stimuli in traditional psychometrics, measure the value orientations they reveal, and aggregate the results. Applying GPV to human-authored blogs, we demonstrate its stability, validity, and superiority over prior psychological tools. Then, extending GPV to LLM value measurement, we advance the current art with 1) a psychometric methodology that measures LLM values based on their scalable and free-form outputs, enabling context-specific measurement; 2) a comparative analysis of measurement paradigms, indicating response biases of prior methods; and 3) an attempt to bridge LLM values and their safety, revealing the predictive power of different value systems and the impacts of various values on LLM safety. Through interdisciplinary efforts, we aim to leverage AI for next-generation psychometrics and psychometrics for value-aligned AI.

Subject: AAAI.2025 - Philosophy and Ethics of AI

#24 Neural Control and Certificate Repair via Runtime Monitoring [PDF] [Copy] [Kimi] [REL]

Authors: Emily Yu, Đorđe Žikelić, Thomas A. Henzinger

Learning-based methods provide a promising approach to solving highly non-linear control tasks that are often challenging for classical control methods. To ensure the satisfaction of a safety property, learning-based methods jointly learn a control policy together with a certificate function for the property. Popular examples include barrier functions for safety and Lyapunov functions for asymptotic stability. While there has been significant progress on learning-based control with certificate functions in the white-box setting, where the correctness of the certificate function can be formally verified, there has been little work on ensuring their reliability in the black-box setting where the system dynamics are unknown. In this work, we consider the problems of certifying and repairing neural network control policies and certificate functions in the black-box setting. We propose a novel framework that utilizes runtime monitoring to detect system behaviors that violate the property of interest under some initially trained neural network policy and certificate. These violating behaviors are used to extract new training data, that is used to re-train the neural network policy and the certificate function and to ultimately repair them. We demonstrate the effectiveness of our approach empirically by using it to repair and to boost the safety rate of neural network policies learned by a state-of-the-art method for learning-based control on two autonomous system control tasks.

Subject: AAAI.2025 - Philosophy and Ethics of AI

#25 RUNA: Object-Level Out-of-Distribution Detection via Regional Uncertainty Alignment of Multimodal Representations [PDF] [Copy] [Kimi] [REL]

Authors: Bin Zhang, Jinggang Chen, Xiaoyang Qu, Guokuan Li, Kai Lu, Jiguang Wan, Jing Xiao, Jianzong Wang

Enabling object detectors to recognize out-of-distribution (OOD) objects is vital for building reliable systems. A primary obstacle stems from the fact that models frequently do not receive supervisory signals from unfamiliar data, leading to overly confident predictions regarding OOD objects. Despite previous progress that estimates OOD uncertainty based on the detection model and in-distribution (ID) samples, we explore using pre-trained vision-language representations for object-level OOD detection. We first discuss the limitations of applying image-level CLIP-based OOD detection methods to object-level scenarios. Building upon these insights, we propose RUNA, a novel framework that leverages a dual encoder architecture to capture rich contextual information and employs a regional uncertainty alignment mechanism to distinguish ID from OOD objects effectively. We introduce a few-shot fine-tuning approach that aligns region-level semantic representations to further improve the model's capability to discriminate between similar objects. Our experiments show that RUNA substantially surpasses state-of-the-art methods in object-level OOD detection, particularly in challenging scenarios with diverse and complex object instances.

Subject: AAAI.2025 - Philosophy and Ethics of AI