Visual Relation Diffusion for Human-Object Interaction Detection

#1 Visual Relation Diffusion for Human-Object Interaction Detection [PDF⁴] [Copy] [Kimi] [REL]

Authors: Ping Cao, Yepeng Tang, Chunjie Zhang, Xiaolong Zheng, Chao Liang, Yunchao Wei, Yao Zhao

Human-object interaction (HOI) detection relies on fine-grained visual understanding to distinguish complex relationships between humans and objects. While recent generative diffusion models have demonstrated remarkable capability in learning detailed visual concepts through pixel-level generation, their potential for interaction-level relationship modeling remains largely unexplored. To bridge this gap, we propose a Visual Relation Diffusion model (VRDiff), which introduces dense visual relation conditions to guide interaction understanding. Specifically, we encode interaction-aware condition representations that capture both spatial responsiveness and contextual semantics of human-object pairs, conditioning the diffusion process purely on visual features rather than text-based inputs. Furthermore, we refine these relation representations through generative feedback from the diffusion model, enhancing HOI detection without requiring image synthesis. Extensive experiments on the HICO-DET benchmark demonstrate that VRDiff achieves competitive results under both standard and zero-shot HOI detection settings.

Subject: ICCV.2025 - Poster

Cao_Visual_Relation_Diffusion_for_Human-Object_Interaction_Detection@ICCV2025@CVF

#1 Visual Relation Diffusion for Human-Object Interaction Detection [PDF4] [Copy] [Kimi] [REL]

#1 Visual Relation Diffusion for Human-Object Interaction Detection [PDF⁴] [Copy] [Kimi] [REL]