Why Does the LLM Stop Computing: An Empirical Study of User-Reported Failures in Open-Source LLMs

#1 Why Does the LLM Stop Computing: An Empirical Study of User-Reported Failures in Open-Source LLMs [PDF] [Copy] [Kimi] [REL]

Authors: Guangba Yu, Zirui Wang, Yujie Huang, Renyi Zhong, Yuedong Zhong, Yilun Wang, Michael R. Lyu

The democratization of open-source Large Language Models (LLMs) allows users to fine-tune and deploy models on local infrastructure but exposes them to a First Mile deployment landscape. Unlike black-box API consumption, the reliability of user-managed orchestration remains a critical blind spot. To bridge this gap, we conduct the first large-scale empirical study of 705 real-world failures from the open-source DeepSeek, Llama, and Qwen ecosystems. Our analysis reveals a paradigm shift: white-box orchestration relocates the reliability bottleneck from model algorithmic defects to the systemic fragility of the deployment stack. We identify three key phenomena: (1) Diagnostic Divergence: runtime crashes distinctively signal infrastructure friction, whereas incorrect functionality serves as a signature for internal tokenizer defects. (2) Systemic Homogeneity: Root causes converge across divergent series, confirming reliability barriers are inherent to the shared ecosystem rather than specific architectures. (3) Lifecycle Escalation: Barriers escalate from intrinsic configuration struggles during fine-tuning to compounded environmental incompatibilities during inference. Supported by our publicly available dataset, these insights provide actionable guidance for enhancing the reliability of the LLM landscape.

Subjects: Software Engineering , Artificial Intelligence , Distributed, Parallel, and Cluster Computing

Publish: 2026-01-20 06:42:56 UTC

2601.13655

#1 Why Does the LLM Stop Computing: An Empirical Study of User-Reported Failures in Open-Source LLMs [PDF] [Copy] [Kimi] [REL]