Structured Moral Reasoning in Language Models: A Value-Grounded Evaluation Framework

#1 Structured Moral Reasoning in Language Models: A Value-Grounded Evaluation Framework [PDF] [Copy] [Kimi] [REL]

Authors: Mohna Chakraborty, Lu Wang, David Jurgens

Large language models (LLMs) are increasingly deployed in domains requiring moral understanding, yet their reasoning often remains shallow, and misaligned with human reasoning. Unlike humans, whose moral reasoning integrates contextual trade-offs, value systems, and ethical theories, LLMs often rely on surface patterns, leading to biased decisions in morally and ethically complex scenarios. To address this gap, we present a value-grounded framework for evaluating and distilling structured moral reasoning in LLMs. We benchmark 12 open-source models across four moral datasets using a taxonomy of prompts grounded in value systems, ethical theories, and cognitive reasoning strategies. Our evaluation is guided by four questions: (1) Does reasoning improve LLM decision-making over direct prompting? (2) Which types of value/ethical frameworks most effectively guide LLM reasoning? (3) Which cognitive reasoning strategies lead to better moral performance? (4) Can small-sized LLMs acquire moral competence through distillation? We find that prompting with explicit moral structure consistently improves accuracy and coherence, with first-principles reasoning and Schwartz's + care-ethics scaffolds yielding the strongest gains. Furthermore, our supervised distillation approach transfers moral competence from large to small models without additional inference cost. Together, our results offer a scalable path toward interpretable and value-grounded models.

Subject: Human-Computer Interaction

Publish: 2025-06-17 19:59:44 UTC

2506.14948

#1 Structured Moral Reasoning in Language Models: A Value-Grounded Evaluation Framework [PDF] [Copy] [Kimi] [REL]