Wait, am I Being Fair? Characterizing Deductive Stereotyping and Mitigating It with Fair-GCG

#1 Wait, am I Being Fair? Characterizing Deductive Stereotyping and Mitigating It with Fair-GCG [PDF] [Copy] [Kimi] [REL]

Authors: Naihao Deng, Yilun Zhu, Joan Nwatu, Clayton Scott, Rada Mihalcea

Warning: This paper contains several toxic and offensive statements. While reasoning generally improves fairness in recent large language models (LLMs), failures persist. In this work, we identify a failure mode, deductive stereotyping, in which models apply population-level statistical regularities to individual cases, producing logically coherent yet socially biased inferences. We provide a statistical interpretation of this phenomenon. To steer models toward fairness-aware reasoning, we propose a reasoning-time injection framework. We further introduce Fair-GCG to systematically discover effective injection phrases. Injection phrases discovered by Fair-GCG improve performance across multiple fairness benchmarks, generalize from smaller to larger LLMs, improves reasoning-level fairness, reduces bias in open-ended generation, and transfer to real-world fairness-sensitive tasks.

Subjects: Computation and Language , Artificial Intelligence

Publish: 2026-06-30 00:00:42 UTC

2606.30989

#1 Wait, am I Being Fair? Characterizing Deductive Stereotyping and Mitigating It with Fair-GCG [PDF] [Copy] [Kimi] [REL]