Shen_Cross-View_Isolated_Sign_Language_Recognition_via_View_Synthesis_and_Feature@ICCV2025@CVF

Total: 1

#1 Cross-View Isolated Sign Language Recognition via View Synthesis and Feature Disentanglement [PDF1] [Copy] [Kimi] [REL]

Authors: Xin Shen, Xinyu Wang, Lei Shen, Kaihao Zhang, Xin Yu

Cross-view isolated sign language recognition (CV-ISLR) addresses the challenge of identifying isolated signs from viewpoints unseen during training, a problem aggravated by the scarcity of multi-view data in existing benchmarks. To bridge this gap, we introduce a novel two-stage framework comprising View Synthesis and Contrastive Multi-task View-Semantics Recognition. In the View Synthesis stage, we simulate unseen viewpoints by extracting 3D keypoints from the front-view training dataset and synthesizing common-view 2D skeleton sequences with virtual camera rotation, which enriches view diversity without the cost of multi-camera setups. However, direct training on these synthetic samples leads to limited improvement, as viewpoint-specific and semantics-specific features remain entangled. To overcome this drawback, we present a Contrastive Multi-task View-Semantics Recognition (CMVSR) module that disentangles viewpoint-dependent features from sign semantics. In this way, CMVSR obtains view-invariant representations of the sign video, leading to robust recognition performance against various camera viewpoints. We evaluate our approach on the MM-WLAuslan dataset, the first benchmark for CV-ISLR, and on our extended protocol MTV-Test that includes additional multi-view data captured in the wild. Experimental results demonstrate that our method not only improves the accuracy of front-view skeleton-based isolated sign language recognition, but also exhibits superior generalization to novel viewpoints.

Subject: ICCV.2025 - Poster