TivTok: Broadcasting Time-Invariant Tokens for Scalable Video Tokenization

#1 TivTok: Broadcasting Time-Invariant Tokens for Scalable Video Tokenization [PDF⁶] [Copy] [Kimi⁴] [REL]

Authors: Weiliang Chen, Yuanhui Huang, Xuebo Wang, Yueqi Duan

Video tokenization is fundamental to scalable video generation, as the number of tokens directly determines the computational cost and the length of videos that can be modeled. Existing tokenizers mainly improve scalability by compressing videos into fewer tokens, but they often continue to represent persistent content, such as static backgrounds and consistent object appearances, repeatedly across frames and chunks. In this paper, we propose \textbf{TivTok} (\textit{Time-Invariant Tokenizer}), a reuse-aware video tokenizer that makes persistent information reusable across time. TivTok represents a clip with Time-Invariant (TIV) tokens that encode information shared across frames and Time-Variant (TV) tokens that encode frame-specific residuals. To obtain this factorization, we introduce Scope-Induced Factorization (SIF), which assigns different attention scopes to the two token groups: TIV tokens attend to the full clip, whereas each TV token only accesses its corresponding frame together with the TIV tokens. In the decoder, Invariant Broadcasting (IB) reuses the same TIV tokens across frames and chunks for parallel reconstruction and long-video tokenization. Experiments show that TivTok achieves an rFVD of 12.65 on the standard $16{\times}256{\times}256$ benchmark and improves compression efficiency by 2.91$\times$ for 128-frame videos compared with the evaluated baselines, while using only 1.1\% of the tokens required by downsample-based tokenizers in our evaluation.

Subject: Computer Vision and Pattern Recognition

Publish: 2026-06-16 06:52:52 UTC

2606.17590

#1 TivTok: Broadcasting Time-Invariant Tokens for Scalable Video Tokenization [PDF6] [Copy] [Kimi4] [REL]

#1 TivTok: Broadcasting Time-Invariant Tokens for Scalable Video Tokenization [PDF⁶] [Copy] [Kimi⁴] [REL]