2601.22699

Total: 1

#1 Models Know Models Best: Evaluation via Model-Preferred Formats [PDF] [Copy] [Kimi] [REL]

Authors: Joonhak Lee, Sungmok Jung, Jongyeon Park, Jaejin Lee

Performance of Large Language Models (LLMs) on multiple-choice tasks differs markedly between symbol-based and cloze-style evaluation formats. The observed discrepancies are systematically attributable to task characteristics: natural language continuation benefits from likelihood scoring, whereas explicit comparison is better suited to symbol-based selection. These trends are consistent across various decoder-based LLMs, indicating model-agnostic effects. To address these inconsistencies, a dynamic format-alignment strategy is introduced that employs a lightweight classifier trained on latent model-preference signals. In contrast to human-designed heuristics, which often degrade performance, this approach uses model-generated signals to determine the optimal format for each problem instance. The proposed method achieves substantial and consistent improvements in zero-shot accuracy across reasoning and knowledge benchmarks, better revealing the models' latent capabilities.

Subject: Computation and Language

Publish: 2026-01-30 08:15:41 UTC