Few-Shot Learning Translation from New Languages

#1 Few-Shot Learning Translation from New Languages [PDF] [Copy] [Kimi] [REL]

Authors: Carlos Mullov, Alexander Waibel

Recent work shows strong transfer learning capability to unseen languages in sequence-to-sequence neural networks, under the assumption that we have high-quality word representations for the target language. We evaluate whether this direction is a viable path forward for translation from low-resource languages by investigating how much data is required to learn such high-quality word representations. We first show that learning word embeddings separately from a translation model can enable rapid adaptation to new languages with only a few hundred sentences of parallel data. To see whether the current bottleneck in transfer to low-resource languages lies mainly with learning the word representations, we then train word embeddings models on varying amounts of data, to then plug them into a machine translation model. We show that in this simulated low-resource setting with only 500 parallel sentences and 31,250 sentences of monolingual data we can exceed 15 BLEU on Flores on unseen languages. Finally, we investigate why on a real low-resource language the results are less favorable and find fault with the publicly available multilingual language modelling datasets.

Subject: EMNLP.2025 - Main

2025.emnlp-main.163@ACL

#1 Few-Shot Learning Translation from New Languages [PDF] [Copy] [Kimi] [REL]