2025.naacl-srw.36@ACL

Total: 1

#1 Representing and Clustering Errors in Offensive Language Detection [PDF] [Copy] [Kimi] [REL]

Authors: Jood Otey, Laura Biester, Steven R Wilson

Content moderation is essential in preventing the spread of harmful content on the Internet. However, there are instances where moderation fails and it is important to understand when and why that happens. Workflows that aim to uncover a system’s weakness typically use clustering of the data points’ embeddings to group errors together. In this paper, we evaluate the K-Means clustering of four text representations for the task of offensive language detection in English and Levantine Arabic. We find Sentence-BERT (SBERT) embeddings give the most human-interpretable clustering for English errors and the grouping is mainly based on the targeted group in the text. Meanwhile, SBERT embeddings of Large Language Model (LLM)-generated linguistic features give the most interpretable clustering for Arabic errors.

Subject: NAACL.2025 - Student Research Workshop