FBS: Modeling Native Parallel Reading inside a Transformer

2601.21708

Total: 1

#1 FBS: Modeling Native Parallel Reading inside a Transformer [PDF²] [Copy] [Kimi²] [REL]

Large language models (LLMs) excel across many tasks, yet inference is still dominated by strictly token-by-token autoregression. Existing acceleration methods largely patch this pipeline and miss core human-reading ingredients: content-adaptive foresight, chunk-structure-aware compute allocation, and train--test consistency for preview/skimming. We propose the \textbf{Fovea-Block-Skip Transformer} (FBS), which injects a causal, trainable loop into Transformers via Parafovea-Attention Window (PAW), Chunk-Head (CH), and Skip-Gate (SG). Across diverse benchmarks, FBS improves the quality-efficiency trade-off without increasing parameters, and ablations show the three modules are complementary.

Subjects: Artificial Intelligence , Computation and Language

Publish: 2026-01-29 13:39:55 UTC

2601.21708

#1 FBS: Modeling Native Parallel Reading inside a Transformer [PDF2] [Copy] [Kimi2] [REL]

#1 FBS: Modeling Native Parallel Reading inside a Transformer [PDF²] [Copy] [Kimi²] [REL]