2025.acl-long.120@ACL

Total: 1

#1 Behind Closed Words: Creating and Investigating the forePLay Annotated Dataset for Polish Erotic Discourse [PDF4] [Copy] [Kimi6] [REL]

Authors: Anna Kołos, Katarzyna Lorenc, Emilia Wiśnios, Agnieszka Karlińska

The surge in online content has created an urgent demand for robust detection systems, especially in non-English contexts where current tools demonstrate significant limitations. We introduce forePLay, a novel Polish-language dataset for erotic content detection, comprising over 24,000 annotated sentences. The dataset features a multidimensional taxonomy that captures ambiguity, violence, and socially unacceptable behaviors. Our comprehensive evaluation demonstrates that specialized Polish language models achieve superior performance compared to multilingual alternatives, with transformer-based architectures showing particular strength in handling imbalanced categories. The dataset and accompanying analysis establish essential frameworks for developing linguistically-aware content moderation systems, while highlighting critical considerations for extending such capabilities to morphologically complex languages.

Subject: ACL.2025 - Long Papers