fuBrcTH8NM@OpenReview

Total: 1

#1 Efficient Construction of Model Family through Progressive Training Using Model Expansion [PDF] [Copy] [Kimi] [REL]

Authors: Kazuki Yano, Sho Takase, Sosuke Kobayashi, Shun Kiyono, Jun Suzuki

As Large Language Models (LLMs) gain widespread practical applica- tion, offering model families with varying parameter sizes has become standard practice to accommodate diverse computational requirements. Traditionally, each model in the family is trained independently, incurring computational costs that scale additively with the number of models. In this work, we propose an efficient method for constructing model families via progressive training, where smaller models are incrementally expanded to larger sizes to create a complete model family. Through extensive ex- periments on a model family ranging from 1B to 8B parameters, we show that our approach reduces total computational cost by approximately 25% while maintaining comparable performance to independently trained mod- els. Moreover, by strategically adjusting the maximum learning rate based on model size, our method outperforms the independent training across various metrics. Beyond these improvements, our approach also fosters greater consistency in behavior across model sizes.

Subject: COLM.2025