Do Biased Models Have Biased Thoughts?

#1 Do Biased Models Have Biased Thoughts? [PDF] [Copy] [Kimi¹] [REL]

Authors: Swati Rajwal, Shivank Garg, Reem Abdel-Salam, Abdelrahman Zayed

The impressive performance of language models is undeniable. However, the presence of biases based on gender, race, socio-economic status, physical appearance, and sexual orientation makes the deployment of language models challenging. This paper studies the effect of chain-of-thought prompting, a recent approach that studies the steps followed by the model before it responds, on fairness. More specifically, we ask the following question: $\textit{Do biased models have biased thoughts}$? To answer our question, we conduct experiments on $5$ popular large language models using fairness metrics to quantify $11$ different biases in the model's thoughts and output. Our results show that the bias in the thinking steps is not highly correlated with the output bias (less than $0.6$ correlation with a $p$-value smaller than $0.001$ in most cases). In other words, unlike human beings, the tested models with biased decisions do not always possess biased thoughts.

Subjects: Computation and Language , Artificial Intelligence

Publish: 2025-08-08 19:41:20 UTC

2508.06671

#1 Do Biased Models Have Biased Thoughts? [PDF] [Copy] [Kimi1] [REL]

#1 Do Biased Models Have Biased Thoughts? [PDF] [Copy] [Kimi¹] [REL]