Towards Practical and Knowledgeable LLMs for a Multilingual World: A Thesis Proposal

#1 Towards Practical and Knowledgeable LLMs for a Multilingual World: A Thesis Proposal [PDF] [Copy] [Kimi] [REL]

Author: Bryan Li

The frontier of large language model (LLM) development has largely been substantiated by knowledge-intensive tasks specified in English. In this proposed thesis, I argue for the key role that multilinguality occupies in the development of practical and knowledgeable LLMs.First, I consider practical methods to improve LLM’s performance on standard natural language processing (NLP) tasks by leveraging their existing multilingual knowledge.Then, I investigate the underlying multilingual knowledge of LLMs with two benchmarks: on complex reasoning, and on territorial disputes. These benchmarks reveal LLMs’ inconsistent performance across languages. I then design efficient techniques, both at inference-time and training-time, to address these discrepancies. Finally, I extend the territorial disputes benchmark to retrieval-augmented generation (RAG) setting, comparing the effects of different retrieval settings on cross-lingual robustness. My proposal shows that informed use of multilinguality enhances LLMs’ capabilities, and our understanding thereof.

Subject: NAACL.2025 - Student Research Workshop

2025.naacl-srw.30@ACL

#1 Towards Practical and Knowledgeable LLMs for a Multilingual World: A Thesis Proposal [PDF] [Copy] [Kimi] [REL]