Something Just Like TRuST : Toxicity Recognition of Span and Target

#1 Something Just Like TRuST : Toxicity Recognition of Span and Target [PDF¹] [Copy] [Kimi] [REL]

Authors: Berk Atil, Namrata Sureddy, Rebecca J. Passonneau

Toxicity in online content, including content generated by language models, has become a critical concern due to its potential for negative psychological and social impact. This paper introduces TRuST, a comprehensive dataset designed to improve toxicity detection that merges existing datasets, and has labels for toxicity, target social group, and toxic spans. It includes a diverse range of target groups such as ethnicity, gender, religion, disability, and politics, with both human/machine-annotated and human machine-generated data. We benchmark state-of-the-art large language models (LLMs) on toxicity detection, target group identification, and toxic span extraction. We find that fine-tuned models consistently outperform zero-shot and few-shot prompting, though performance remains low for certain social groups. Further, reasoning capabilities do not significantly improve performance, indicating that LLMs have weak social reasoning skills.

Subjects: Computation and Language , Artificial Intelligence

Publish: 2025-06-02 23:48:16 UTC

2506.02326

#1 Something Just Like TRuST : Toxicity Recognition of Span and Target [PDF1] [Copy] [Kimi] [REL]

#1 Something Just Like TRuST : Toxicity Recognition of Span and Target [PDF¹] [Copy] [Kimi] [REL]