DanceText: A Training-Free Layered Framework for Controllable Multilingual Text Transformation in Images

#1 DanceText: A Training-Free Layered Framework for Controllable Multilingual Text Transformation in Images [PDF] [Copy] [Kimi] [REL]

Authors: Zhenyu Yu, Mohd Yamani Idna Idris, Hua Wang, Pei Wang, Rizwan Qureshi, Shaina Raza, Aman Chadha, Yong Xiang, Zhixiang Chen

We present DanceText, a training-free framework for multilingual text editing in images, designed to support complex geometric transformations and achieve seamless foreground-background integration. While diffusion-based generative models have shown promise in text-guided image synthesis, they often lack controllability and fail to preserve layout consistency under non-trivial manipulations such as rotation, translation, scaling, and warping. To address these limitations, DanceText introduces a layered editing strategy that separates text from the background, allowing geometric transformations to be performed in a modular and controllable manner. A depth-aware module is further proposed to align appearance and perspective between the transformed text and the reconstructed background, enhancing photorealism and spatial consistency. Importantly, DanceText adopts a fully training-free design by integrating pretrained modules, allowing flexible deployment without task-specific fine-tuning. Extensive experiments on the AnyWord-3M benchmark demonstrate that our method achieves superior performance in visual quality, especially under large-scale and complex transformation scenarios. Code is avaible at https://github.com/YuZhenyuLindy/DanceText.git.

Subject: Computer Vision and Pattern Recognition

Publish: 2025-04-18 23:46:32 UTC

2504.14108

#1 DanceText: A Training-Free Layered Framework for Controllable Multilingual Text Transformation in Images [PDF] [Copy] [Kimi] [REL]