2025.naacl-srw.19@ACL

Total: 1

#1 Multilingual Native Language Identification with Large Language Models [PDF] [Copy] [Kimi] [REL]

Authors: Dhiman Goswami, Marcos Zampieri, Kai North, Shervin Malmasi, Antonios Anastasopoulos

Native Language Identification (NLI) is the task of automatically identifying the native language (L1) of individuals based on their second language (L2) production. The introduction of Large Language Models (LLMs) with billions of parameters has renewed interest in text-based NLI, with new studies exploring LLM-based approaches to NLI on English L2. The capabilities of state-of-the-art LLMs on non-English NLI corpora, however, have not yet been fully evaluated. To fill this important gap, we present the first evaluation of LLMs for multilingual NLI. We evaluated the performance of several LLMs compared to traditional statistical machine learning models and language-specific BERT-based models on NLI corpora in English, Italian, Norwegian, and Portuguese. Our results show that fine-tuned GPT-4 models achieve state-of-the-art NLI performance.

Subject: NAACL.2025 - Student Research Workshop