2025.findings-emnlp.301@ACL

Total: 1

#1 Long-context Language Models Fail in Basic Retrieval Tasks Without Sufficient Reasoning Steps [PDF] [Copy] [Kimi] [REL]

Authors: Yijiong Yu, Zhixiao Qi, Yongfeng Huang, Wei Wang, Weifeng.liu, Ran Chen, Ji Pei

Long-context language models (LCLMs), characterized by their extensive context window, are becoming popular. However, despite the fact that they are nearly perfect at standard long-context retrieval tasks, our evaluations demonstrate they fail in some basic cases. Later, we find they can be well addressed with a sufficient number of reasoning steps, guided by specific CoT prompts. This result emphasizes the potential necessity of solving specific long-context tasks using long-CoT methods, while previous long-context benchmarks always ignore the necessity of long reasoning for long-context tasks and treat them as direct QA tasks. Our code and datasets are available at https://github.com/yuyijiong/hard_retrieval_for_llm

Subject: EMNLP.2025 - Findings