Total: 1
Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective deepfake text detection to mitigate risks like the spread of fake news and plagiarism. Existing research has been constrained by evaluating detection methods o specific domains or particular language models. In practical scenarios, however, the detector faces texts from various domains or LLMs without knowing their sources. To this end, we build a comprehensive testbed by gathering texts from diverse human writings and deepfake texts generated by different LLMs. Empirical results on mainstream detection methods demonstrate the difficulties associated with detecting deepfake text in a wide-ranging testbed, particularly in out-of-distribution scenarios. Such difficulties align with the diminishing linguistic differences between the two text sources. Despite challenges, the top-performing detector can identify 84.12% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios.