Towards Explainable Hate Speech Detection

#1 Towards Explainable Hate Speech Detection [PDF¹] [Copy] [Kimi¹] [REL]

Authors: Happy Khairunnisa Sariyanto, Diclehan Ulucan, Oguzhan Ulucan, Marc Ebner

Recent advancements in deep learning have significantly enhanced the efficiency and accuracy of natural language processing (NLP) tasks. However, these models often require substantial computational resources, which remains a major drawback. Reducing the complexity of deep learning architectures, and exploring simpler yet effective approaches can lead to cost-efficient NLP solutions. This is also a step towards explainable AI, i.e., uncovering how a particular task is carried out. For this analysis, we chose the task of hate speech detection. We address hate speech detection by introducing a model that employs a weighted sum of valence, arousal, and dominance (VAD) scores for classification. To determine the optimal weights and classification strategies, we analyze hate speech and non-hate speech words based on both their individual and summed VAD-values. Our experimental results demonstrate that this straightforward approach can compete with state-of-the-art neural network methods, including GPT-based models, in detecting hate speech.

Subject: ACL.2025 - Findings

2025.findings-acl.667@ACL

#1 Towards Explainable Hate Speech Detection [PDF1] [Copy] [Kimi1] [REL]

#1 Towards Explainable Hate Speech Detection [PDF¹] [Copy] [Kimi¹] [REL]