2506.01955

Total: 1

#1 Dual-Process Image Generation [PDF27] [Copy] [Kimi17] [REL]

Authors: Grace Luo, Jonathan Granskog, Aleksander Holynski, Trevor Darrell

Prior methods for controlling image generation are limited in their ability to be taught new tasks. In contrast, vision-language models, or VLMs, can learn tasks in-context and produce the correct outputs for a given input. We propose a dual-process distillation scheme that allows feed-forward image generators to learn new tasks from deliberative VLMs. Our scheme uses a VLM to rate the generated images and backpropagates this gradient to update the weights of the image generator. Our general framework enables a wide variety of new control tasks through the same text-and-image based interface. We showcase a handful of applications of this technique for different types of control signals, such as commonsense inferences and visual prompts. With our method, users can implement multimodal controls for properties such as color palette, line weight, horizon position, and relative depth within a matter of minutes. Project page: https://dual-process.github.io.

Subjects: Computer Vision and Pattern Recognition , Computation and Language , Machine Learning

Publish: 2025-06-02 17:59:56 UTC