2025.emnlp-main.961@ACL

Total: 1

#1 PhonoThink: Improving Large Language Models’ Reasoning on Chinese Phonological Ambiguities [PDF] [Copy] [Kimi] [REL]

Authors: Jianfei Ma, Zhaoxin Feng, Emmanuele Chersoni, Huacheng Song, Ziqi Zhang

Effectively resolving phonological ambiguities is crucial for robust natural language processing, as these ambiguities are pervasive in tasks ranging from speech-to-text, spelling correction, to offensive language detection. However, current Large Language Models (LLMs) frequently struggle to resolve such ambiguities.To address this challenge, we present a framework to enhances LLMs’ phonological capability through a multiple-stage training approach. Our method begins with supervised fine-tuning on well-constructed datasets, including three subtask datasets designed to enhance the model’s foundational phonological knowledge, along with a synthetic dataset of step-by-step reasoning chains. Following this, we apply reinforcement learning to incentivize and stabilize its reasoning.Results show that our framework enables the base model to achieve relatively comparable performance to a much larger model. Our ablation studies reveal that subtask datasets and the synthetic dataset can simultaneously impact as complementary modular enhancers to strengthen LLMs’ integrated application.

Subject: EMNLP.2025 - Main