R7HJj1YvJH@OpenReview

Total: 1

#1 Beyond Expectations: Quantile-Guided Alignment for Risk-Calibrated Language Models [PDF2] [Copy] [Kimi1] [REL]

Authors: Xinran Wang, Jin Du, Azal Ahmad Khan, Qi Le, Enmao Diao, Jiawei Zhou, Jie Ding, Ali Anwar

Large language models can generate rare but catastrophic outputs, such as harmful conversations or insecure code. Existing Reinforcement Learning from Human Feedback (RLHF) typically maximizes average reward, leaving high-risk tail events insufficiently controlled. We introduce Quantile‑Guided Alignment (QA), a framework that allows users to specify desired improvements at any quantile—individually or across multiple reward dimensions—thus shifting the distribution of outputs with finer control toward safer, more desirable outcomes. The method extends standard RLHF via an augmented reward formulation that enforces quantile constraints. Experiments on conversation and code‐generation tasks show that quantile alignment significantly enhances quality at targeted tails while maintaining overall performance. The results position QA as a principled route to risk‑calibrated language models with tail‑focused alignment.

Subject: NeurIPS.2025 - Spotlight