LEAF: Growing Trees Without Branching for Speech-Aware Large Language Model Post-Training

#1 LEAF: Growing Trees Without Branching for Speech-Aware Large Language Model Post-Training [PDF] [Copy] [Kimi] [REL]

Authors: Argyrios Gerogiannis, Yekaterina Yegorova, Mark Hasegawa-Johnson, Venugopal V. Veeravalli

State-of-the-art GRPO-style methods for speech-aware large language model post-training suffer from coarse credit assignment, broadcasting the same terminal-reward advantage to every token in a response. This ignores useful structure within rollout batches, where speech-conditioned completions often share prefixes before diverging at important decisions. We propose Low-rank Exploration with Adaptive Forking (LEAF), a retrospective tree-based RL method that recovers this structure without online branching or additional decoding. LEAF samples complete responses, selects high-surprisal boundaries, groups responses by shared prefixes, and assigns span-level advantages using descendant rewards. We theoretically justify LEAF's span-level credit assignment and boundary-selection design. Empirically, LEAF improves over GRPO across speech question answering and speech translation benchmarks under the same rollout and low-rank adaptation budget. Notably, smaller LEAF-trained models outperform current state-of-the-art, full-parameter baselines.

Subjects: Machine Learning , Artificial Intelligence , Computation and Language

Publish: 2026-05-29 15:50:50 UTC

2606.07610

#1 LEAF: Growing Trees Without Branching for Speech-Aware Large Language Model Post-Training [PDF] [Copy] [Kimi] [REL]