2025.emnlp-main.1026@ACL

Total: 1

#1 Learning Subjective Label Distributions via Sociocultural Descriptors [PDF] [Copy] [Kimi] [REL]

Authors: Mohammed Fayiz Parappan, Ricardo Henao

Subjectivity in NLP tasks, _e.g._, toxicity classification, has emerged as a critical challenge precipitated by the increased deployment of NLP systems in content-sensitive domains. Conventional approaches aggregate annotator judgements (labels), ignoring minority perspectives, and overlooking the influence of the sociocultural context behind such annotations. We propose a framework where subjectivity in binary labels is modeled as an empirical distribution accounting for the variation in annotators through human values extracted from sociocultural descriptors using a language model. The framework also allows for downstream tasks such as population and sociocultural group-level majority label prediction. Experiments on three toxicity datasets covering human-chatbot conversations and social media posts annotated with diverse annotator pools demonstrate that our approach yields well-calibrated toxicity distribution predictions across binary toxicity labels, which are further used for majority label prediction across cultural subgroups, improving over existing methods.

Subject: EMNLP.2025 - Main