Self-Verification Provably Prevents Model Collapse in Recursive Synthetic Training

#1 Self-Verification Provably Prevents Model Collapse in Recursive Synthetic Training [PDF] [Copy] [Kimi] [REL]

Authors: Shi Fu, Yingjie Wang, Yuzhu Chen, Li Shen, Dacheng Tao

Large generative models are increasingly trained on synthetic data from earlier generations, raising concerns about *model collapse*, a progressive performance decline consistently observed in empirical studies. However, theoretical understanding of recursive training dynamics and their failure modes remains limited. In this work, we theoretically show that recursive training inherently leads to exponential error growth unless mitigated by sufficient real data. Addressing the growing scarcity of real data, we introduce a self-verification mechanism enabling models to filter their outputs based on internal confidence scores without external validation. Through rigorous analysis, we derive finite-sample error bounds demonstrating that self-verification alone can prevent collapse, even in fully synthetic training regimes. Our theoretical framework extends to large language models (LLMs), characterizing the conditions under which recursive training can maintain stability without performance degradation.

Subject: NeurIPS.2025 - Poster

X5Hk8aMs6w@OpenReview

#1 Self-Verification Provably Prevents Model Collapse in Recursive Synthetic Training [PDF] [Copy] [Kimi] [REL]