nUFGbRWl5W@OpenReview

Total: 1

#1 OSTAR: Optimized Statistical Text-classifier with Adversarial Resistance [PDF] [Copy] [Kimi1] [REL]

Authors: Yuhan Yao, Feifei Kou, Lei Shi, Xiao yang, Zhongbao Zhang, Suguo Zhu, Jiwei Zhang, Lirong Qiu, LI Haisheng

The advancements in generative models and the real-world attack of machine-generated text(MGT) create a demand for more robust detection methods. The existing MGT detection methods for adversarial environments primarily consist of manually designed statistical-based methods and fine-tuned classifier-based approaches. Statistical-based methods extract intrinsic features but suffer from rigid decision boundaries vulnerable to adaptive attacks, while fine-tuned classifiers achieve outstanding performance at the cost of overfitting to superficial textual feature. We argue that the key to detection in current adversarial environments lies in how to extract intrinsic invariant features and ensure that the classifier possesses dynamic adaptability. In that case, we propose OSTAR, a novel MGT detection framework designed for adversarial environments which composed of a statistical enhanced classifier and a Multi-Faceted Contrastive Learning(MFCL). In the classifier aspect, our Multi-Dimensional Statistical Profiling (MDSP) module extracts intrinsic difference between human and machine texts, complementing classifiers with useful stable features. In the model optimization aspect, the MFCL strategy enhances robustness by contrasting feature variations before and after text attacks, jointly optimizing statistical feature mapping and baseline pre-trained models. Experimental results on three public datasets under various adversarial scenarios demonstrate that our framework outperforms existing MGT detection methods, achieving state-of-the-art performance and robust against attacks.The code is available at https://github.com/BUPT-SN/OSTAR.

Subject: NeurIPS.2025 - Poster