Understanding Chain-of-Thought Effectiveness in Code Generation: An Empirical and Information-Theoretic Analysis

#1 Understanding Chain-of-Thought Effectiveness in Code Generation: An Empirical and Information-Theoretic Analysis [PDF] [Copy] [Kimi¹] [REL]

Authors: Naizhu Jin, Zhong Li, Guang Yang, Tian Zhang, Qingkai Zeng

Large language models (LLMs) achieve strong performance on code generation, but the mechanisms by which Chain-of-Thought (CoT) prompting helps remain unclear. We present a systematic empirical and information-theoretic study of CoT effectiveness in neural code generation, evaluating five paradigms (Zero-Shot, Zero-Shot CoT, Self-Planning, Structured CoT, Reasoning-CoT) across six Python benchmarks, a multilingual benchmark with 12 programming languages, and six models from 7B to 480B parameters, using conditional mutual information $I(Y;C|X)$ as a conceptual lens. Our results show that externally guided CoT consistently outperforms direct generation, with structured methods improving Pass@1 by 5--12\% on average while using substantially fewer tokens than reflective reasoning, and that CoT benefits depend on language type systems and model capacity. We further find that reasoning \emph{quality} is critical: high-quality structured CoT from strong generators yields significantly higher accuracy than lightweight alternatives with the same template, whereas naive Zero-Shot CoT can even degrade performance. These findings provide practical guidance for choosing CoT strategies based on model capacity, language characteristics, and task complexity.

Subject: Software Engineering

Publish: 2025-12-10 14:25:46 UTC

2512.09679

#1 Understanding Chain-of-Thought Effectiveness in Code Generation: An Empirical and Information-Theoretic Analysis [PDF] [Copy] [Kimi1] [REL]

#1 Understanding Chain-of-Thought Effectiveness in Code Generation: An Empirical and Information-Theoretic Analysis [PDF] [Copy] [Kimi¹] [REL]