2510.13587

Total: 1

#1 HRM^2Avatar: High-Fidelity Real-Time Mobile Avatars from Monocular Phone Scans [PDF1] [Copy] [Kimi1] [REL]

Authors: Chao Shi, Shenghao Jia, Jinhui Liu, Yong Zhang, Liangchao Zhu, Zhonglei Yang, Jinze Ma, Chaoyue Niu, Chengfei Lv

We present HRM$^2$Avatar, a framework for creating high-fidelity avatars from monocular phone scans, which can be rendered and animated in real time on mobile devices. Monocular capture with smartphones provides a low-cost alternative to studio-grade multi-camera rigs, making avatar digitization accessible to non-expert users. Reconstructing high-fidelity avatars from single-view video sequences poses challenges due to limited visual and geometric data. To address these limitations, at the data level, our method leverages two types of data captured with smartphones: static pose sequences for texture reconstruction and dynamic motion sequences for learning pose-dependent deformations and lighting changes. At the representation level, we employ a lightweight yet expressive representation to reconstruct high-fidelity digital humans from sparse monocular data. We extract garment meshes from monocular data to model clothing deformations effectively, and attach illumination-aware Gaussians to the mesh surface, enabling high-fidelity rendering and capturing pose-dependent lighting. This representation efficiently learns high-resolution and dynamic information from monocular data, enabling the creation of detailed avatars. At the rendering level, real-time performance is critical for animating high-fidelity avatars in AR/VR, social gaming, and on-device creation. Our GPU-driven rendering pipeline delivers 120 FPS on mobile devices and 90 FPS on standalone VR devices at 2K resolution, over $2.7\times$ faster than representative mobile-engine baselines. Experiments show that HRM$^2$Avatar delivers superior visual realism and real-time interactivity, outperforming state-of-the-art monocular methods.

Subject: Graphics

Publish: 2025-10-15 14:18:20 UTC