2025.findings-emnlp.463@ACL

Total: 1

#1 Training Medical QA Models Based on Mixed Rewards from Multiple-Choice and Open-Ended Questions [PDF] [Copy] [Kimi] [REL]

Authors: Yue Qiu, Yujan Ting, Pei Dong, Terrence Chen, Weijing Huang

Reinforcement learning (RL) for large language models (LLMs) typically requires clear reward signals, which are often unavailable for open-ended (OE) questions where answer evaluation is ambiguous without scalable expert labeling. We investigate whether LLMs benefit from training on mixed data with varying reward clarity. Our approach combines Multiple-choice questions (MCQs), which offer clear binary rewards, with OE questions, for which we use simpler, potentially noisy rewards such as Jaccard similarity or LLM-based evaluators. We hypothesize that MCQs can stabilize training when mixed with OE questions. Our experiments show this mixed-data approach consistently improves medical question-answering performance across model scales.

Subject: EMNLP.2025 - Findings