Adapting Hybrid Parallel-Head Large Language Models for Southeast Asia

#1 Adapting Hybrid Parallel-Head Large Language Models for Southeast Asia [PDF] [Copy] [Kimi] [REL]

Large language models (LLMs) have rapidly advanced, but their growing compute demands limit accessibility in under-resourced regions like Southeast Asia (SEA). While hybrid architectures combining Attention and State-Space Models (SSMs) offer efficiency gains, most rely on sequential interleaving, leaving the potential of parallel-head mixing largely under-explored. However, the recent Falcon-H1 family of models has demonstrated that parallel-head hybrid architectures are not only viable, but scalable to state-of-the-art levels. I propose investigating this parallel-head architecture as a foundation for efficient, multilingual SEA LLMs. My short-term goal is to adapt Falcon-H1-1.5B via vocabulary expansion and continuous pretraining, mitigating token fragmentation and enabling low-resource adaptation to 9 SEA languages. In the longer term, I will develop a dynamic token routing mechanism to optimize token-level compute allocation within hybrid layers, aiming to maximize efficiency without sacrificing the expressive power needed for complex multilingual contexts. Evaluation will utilize the SEA-HELM framework to assess whether these parallel-hybrid innovations can democratize access to high-performance AI for SEA communities.

Subject: AAAI.2026 - Undergraduate Consortium

42321@AAAI

#1 Adapting Hybrid Parallel-Head Large Language Models for Southeast Asia [PDF] [Copy] [Kimi] [REL]