Latent Adversarial Regularization for Offline Preference Optimization

#1 Latent Adversarial Regularization for Offline Preference Optimization [PDF] [Copy] [Kimi⁸] [REL]

Authors: Enyi Jiang, Yibo Jacky Zhang, Yinglun Xu, Andreas Haupt, Nancy Amato, Sanmi Koyejo

Learning from human feedback typically relies on preference optimization that constrains policy updates through token-level regularization. However, preference optimization for language models is particularly challenging because token-space similarity does not imply semantic or behavioral similarity. To address this challenge, we leverage latent-space regularization for language model preference optimization. We introduce GANPO, which achieves latent-space regularization by penalizing divergence between the internal representations of a policy model and a reference model. Given that latent representations are not associated with explicit probability densities, we adopt an adversarial approach inspired by GANs to minimize latent-space divergence. We integrate GANPO as a regularizer into existing offline preference optimization objectives. Experiments across multiple model architectures and tasks show consistent improvements from latent-space regularization. Further, by comparing GANPO-induced inferential biases with those from token-level regularization, we find that GANPO provides more robust structural feedback under distributional shift and noise while maintaining comparable downstream performance with minor computational overhead.

Subjects: Machine Learning , Artificial Intelligence

Publish: 2026-01-29 18:21:57 UTC

2601.22083

#1 Latent Adversarial Regularization for Offline Preference Optimization [PDF] [Copy] [Kimi8] [REL]

#1 Latent Adversarial Regularization for Offline Preference Optimization [PDF] [Copy] [Kimi⁸] [REL]