Total: 1
Reasoning-based large language models now often produce natural-language thinking traces alongside their answers, but it remains unclear whether these verbalized uncertainties faithfully reflect their knowledge or can be used to improve factuality. We study this question for long-form, knowledge-intensive biography generation. Our pipeline decomposes thinking traces and responses into atomic facts, filters out planning-style content, labels factual reasoning by certainty, and aligns response facts to their supporting reasoning, enabling plan-based filtering, self-verification, and a classifier that predicts factuality from facts and associated reasoning. Preliminary results suggest that high-certainty reasoning is more likely to be included and correct and that structured use of these signals can improve factual precision, though broader validation across models and dataset will be needed.