2025.emnlp-main.1310@ACL

Total: 1

#1 Less is More: The Effectiveness of Compact Typological Language Representations [PDF] [Copy] [Kimi] [REL]

Authors: York Hay Ng, Phuong Hanh Hoang, En-Shiun Annie Lee

Linguistic feature datasets such as URIEL+ are valuable for modelling cross-lingual relationships, but their high dimensionality and sparsity, especially for low-resource languages, limit the effectiveness of distance metrics. We propose a pipeline to optimize the URIEL+ typological feature space by combining feature selection and imputation, producing compact yet interpretable typological representations. We evaluate these feature subsets on linguistic distance alignment and downstream tasks, demonstrating that reduced-size representations of language typology can yield more informative distance metrics and improve performance in multilingual NLP applications.

Subject: EMNLP.2025 - Main