41070@AAAI

Total: 1

#1 The Alignment Game: A Theory of Long-Horizon Alignment Through Recursive Curation [PDF] [Copy] [Kimi] [REL]

Authors: Ali Falahati, Mohammad Mohammadi Amiri, Kate Larson, Lukasz Golab

In self-consuming generative models that train on their own outputs, alignment with user preferences becomes a recursive rather than one-time process. In this paper, we provide the first formal foundation for analyzing the long-term effects of such recursive retraining on alignment. Under a two-stage curation mechanism based on the Bradley–Terry (BT) model, we model alignment as an interaction between two factions: the Model Owner, who filters which outputs should be learned by the model, and the Public User, who determines which outputs are ultimately shared and retained through interactions with the model. Our analysis reveals three structural convergence regimes: consensus collapse, compromise on shared optima, and asymmetric refinement, depending on the degree of preference alignment. We prove a fundamental impossibility theorem: no recursive BT-based curation mechanism can simultaneously preserve diversity, ensure symmetric influence, and eliminate dependence on initialization. Framing the process as dynamic social choice, we show that alignment is not a static goal but an evolving equilibrium shaped by power asymmetries and path dependence.

Subject: AAAI.2026 - Special Track on AI Alignment