Gao_GIViC_Generative_Implicit_Video_Compression@ICCV2025@CVF

Total: 1

#1 GIViC: Generative Implicit Video Compression [PDF] [Copy] [Kimi] [REL]

Authors: Ge Gao, Siyue Teng, Tianhao Peng, Fan Zhang, David Bull

While video compression based on implicit neural representations (INRs) has recently demonstrated great potential, existing INR-based video codecs still cannot achieve state-of-the-art (SOTA) performance compared to their conventional or autoencoder-based counterparts given the same coding configuration. In this context, we propose a **G**enerative **I**mplicit **Vi**deo **C**ompression framework, **GIViC**, aiming at advancing the performance limits of this type of coding methods. GIViC is inspired by the characteristics that INRs share with large language and diffusion models in exploiting *long-term dependencies*. Through the newly designed *implicit diffusion* process, GIViC performs diffusive sampling across coarse-to-fine spatiotemporal decompositions, gradually progressing from coarser-grained full-sequence diffusion to finer-grained per-token diffusion. A novel **Hierarchical Gated Linear Attention-based transformer** (HGLA), is also integrated into the framework, which dual-factorizes global dependency modeling along scale and sequential axes. The proposed GIViC model has been benchmarked against SOTA conventional and neural codecs using a Random Access (RA) configuration (YUV 4:2:0, GOPSize=32), and yields BD-rate savings of 15.94%, 22.46% and 8.52% over VVC VTM, DCVC-FM and NVRC, respectively. As far as we are aware, GIViC is the **first INR-based video codec that outperforms VTM based on the RA coding configuration**. The source code will be made available.

Subject: ICCV.2025 - Poster