d6lIOnvOX2@OpenReview

Total: 1

#1 A Gradient Guidance Perspective on Stepwise Preference Optimization for Diffusion Models [PDF1] [Copy] [Kimi1] [REL]

Authors: Joshua Tian Jin Tee, Hee Suk Yoon, Abu Hanif Muhammad Syarubany, Eunseop Yoon, Chang D. Yoo

Direct Preference Optimization (DPO) is a key framework for aligning text-to-image models with human preferences, extended by Stepwise Preference Optimization (SPO) to leverage intermediate steps for preference learning, generating more aesthetically pleasing images with significantly less computational cost. While effective, SPO's underlying mechanisms remain underexplored. In light of this, we critically re-examine SPO by formalizing its mechanism as gradient guidance. This new lens shows that SPO uses biased temporal weighting, giving too little weight to later generative steps, and unlike likelihood centric views it reveals substantial noise in the gradient estimates. Leveraging these insights, our GradSPO algorithm introduces a simplified loss and a targeted, variance-informed noise reduction strategy, enhancing training stability. Evaluations on SD 1.5 and SDXL show GradSPO substantially outperforms leading baselines in human preference, yielding images with markedly improved aesthetics and semantic faithfulness, leading to more robust alignment. Code and models are available at https://github.com/JoshuaTTJ/GradSPO.

Subject: NeurIPS.2025 - Poster