Reward Fine-Tuning Two-Step Diffusion Models via Learning Differentiable Latent-Space Surrogate Reward

#1 Reward Fine-Tuning Two-Step Diffusion Models via Learning Differentiable Latent-Space Surrogate Reward [PDF] [Copy] [Kimi] [REL]

Authors: Zhiwei Jia, Yuesong Nan, Huixi Zhao, Gengdai Liu

Recent research has shown that fine-tuning diffusion models (DMs) with arbitrary rewards, including non-differentiable ones, is feasible with reinforcement learning (RL) techniques, offering great flexibility in model alignment. However, it is challenging to apply existing RL methods to timestep-distilled DMs for ultra-fast ( $\le2$ -step) image generation.Our analysis suggests several limitations of policy-based RL methods such as PPO or DPO towards improving $\le2$ -step image generation.Based on the insights, we propose to fine-tune DMs with learned differentiable surrogate rewards.Our method, named \textbf{LaSRO}, learns surrogate reward models in the latent space of SDXL to convert arbitrary rewards into differentiable ones for efficient reward gradient guidance.LaSRO leverages pre-trained latent DMs for reward modeling and specifically targets image generation $\le2$ steps for reward optimization, enhancing generalizability and efficiency.We show that LaSRO is effective and stable for improving ultra-fast image generation with different reward objectives, outperforming popular RL methods including those based on PPO or DPO. We further show LaSRO's connection to value-based RL, providing theoretical insights behind it.

Subject: CVPR.2025 - Poster

Jia_Reward_Fine-Tuning_Two-Step_Diffusion_Models_via_Learning_Differentiable_Latent-Space_Surrogate@CVPR2025@CVF

#1 Reward Fine-Tuning Two-Step Diffusion Models via Learning Differentiable Latent-Space Surrogate Reward [PDF] [Copy] [Kimi] [REL]