847@2024@IJCAI

Total: 1

#1 GEM: Generating Engaging Multimodal Content [PDF1] [Copy] [Kimi] [REL]

Authors: Chongyang Gao, Yiren Jian, Natalia Denisenko, Soroush Vosoughi, V. S. Subrahmanian

Generating engaging multimodal content is a key objective in numerous applications, such as the creation of online advertisements that captivate user attention through a synergy of images and text. In this paper, we introduce GEM, a novel framework engineered for the generation of engaging multimodal image-text posts. The GEM framework operates in two primary phases. Initially, GEM integrates a pre-trained engagement discriminator with a technique for deriving an effective continuous prompt tailored for the stable diffusion model. Subsequently, GEM unveils an iterative algorithm dedicated to producing coherent and compelling image-sentence pairs centered around a specified topic of interest. Through a combination of experimental analysis and human evaluations, we establish that the image-sentence pairs generated by GEM not only surpass several established baselines in terms of engagement but also in achieving superior alignment.

Subject: IJCAI.2024 - Others