2606.02430

Total: 1

#1 Not All Errors Are Equal: A Systematic Study of Error Propagation in Large Language Model Inference [PDF] [Copy] [Kimi3] [REL]

Authors: Yafan Huang, Sheng Di, Guanpeng Li

Large language models (LLMs) are increasingly integrated into high-performance computing (HPC) workflows, accelerating scientific discovery through diverse perspectives such as code generation and domain-specific decision-making. Yet, how soft errors propagate and affect LLM inference remains largely unexplored. To bridge this gap, we present a comprehensive study on error propagation in LLM inference, enabled by our proposed LLMFI, a configurable and deterministic fault-injection framework. Using LLMFI, we systematically inject faults across three open-weighted LLMs and thirteen representative tasks, covering reasoning, multilingual, mathematical, and coding domains. In addition, we conduct fine-grained case studies that reveal critical vulnerability patterns. Overall, our study yields 17 takeaways that advance the understanding of error propagation in LLM inference and introduces four low-overhead directions to improve reliability through software-only modification, offering practical guidance for future error detection and mitigation.

Subjects: Distributed, Parallel, and Cluster Computing , Artificial Intelligence

Publish: 2026-06-01 16:04:51 UTC