Sarcasm Detection on Reddit Using Classical Machine Learning and Feature Engineering

#1 Sarcasm Detection on Reddit Using Classical Machine Learning and Feature Engineering [PDF] [Copy] [Kimi] [REL]

Sarcasm is common in online discussions, yet difficult for machines to identify because the intended meaning often contradicts the literal wording. In this work, I study sarcasm detection using only classical machine learning methods and explicit feature engineering, without relying on neural networks or context from parent comments. Using a 100,000-comment subsample of the Self-Annotated Reddit Corpus (SARC 2.0), I combine word-level and character-level TF-IDF features with simple stylistic indicators. Four models are evaluated: logistic regression, a linear SVM, multinomial Naive Bayes, and a random forest. Naive Bayes and logistic regression perform the strongest, achieving F1-scores around 0.57 for sarcastic comments. Although the lack of conversational context limits performance, the results offer a clear and reproducible baseline for sarcasm detection using lightweight and interpretable methods.

Subjects: Computation and Language , Machine Learning

Publish: 2025-12-04 02:41:08 UTC

2512.04396

#1 Sarcasm Detection on Reddit Using Classical Machine Learning and Feature Engineering [PDF] [Copy] [Kimi] [REL]