Could Large Language Models work as Post-hoc Explainability Tools in Credit Risk Models?

#1 Could Large Language Models work as Post-hoc Explainability Tools in Credit Risk Models? [PDF] [Copy] [Kimi] [REL]

Authors: Wenxi Geng, Dingyuan Liu, Liya Li, Yiqing Wang

Post-hoc explainability is central to credit risk model governance, yet widely used tools such as coefficient-based attributions and SHapley Additive exPlanations (SHAP) often produce numerical outputs that are difficult to communicate to non-technical stakeholders. This paper investigates whether large language models (LLMs) can serve as post-hoc explainability tools for credit risk predictions through in-context learning, focusing on two roles: translators and autonomous explainers. Using a personal lending dataset from LendingClub, we evaluate three commercial LLMs, including GPT-4-turbo, Claude Sonnet 4, and Gemini-2.0-Flash. Results provide strong evidence for the translator role. In contrast, autonomous explanations show low alignment with model-based attributions. Few-shot prompting improves feature overlap for logistic regression but does not consistently benefit XGBoost, suggesting that LLMs have limited capacity to recover non-linear, interaction-driven reasoning from prompt cues alone. Our findings position LLMs as effective narrative interfaces grounded in auditable model attributions, rather than as substitutes for post-hoc explainers in credit risk model governance. Practitioners should leverage LLMs to bridge the communication gap between complex model outputs and regulatory or business stakeholders, while preserving the rigor and traceability required by credit risk governance frameworks.

Subjects: Risk Management , Machine Learning

Publish: 2026-02-21 16:35:06 UTC

2602.18895

#1 Could Large Language Models work as Post-hoc Explainability Tools in Credit Risk Models? [PDF] [Copy] [Kimi] [REL]