Instructions are all you need: Self-supervised Reinforcement Learning for Instruction Following

#1 Instructions are all you need: Self-supervised Reinforcement Learning for Instruction Following [PDF¹⁰] [Copy] [Kimi⁹] [REL]

Authors: Qingyu Ren, Qianyu He, Bowei Zhang, Jie Zeng, Jiaqing Liang, Yanghua Xiao, Weikang Zhou, Zeye Sun, Fei Yu

Language models often struggle to follow multi-constraint instructions that are crucial for real-world applications. Existing reinforcement learning (RL) approaches suffer from dependency on external supervision and sparse reward signals from multi-constraint tasks. We propose a label-free self-supervised RL framework that eliminates dependency on external supervision by deriving reward signals directly from instructions and generating pseudo-labels for reward model training. Our approach introduces constraint decomposition strategies and efficient constraint-wise binary classification to address sparse reward challenges while maintaining computational efficiency. Experiments show that our approach generalizes well, achieving strong improvements across 3 in-domain and 5 out-of-domain datasets, including challenging agentic and multi-turn instruction following. The data and code are publicly available at https://github.com/Rainier-rq/verl-if

Subjects: Computation and Language , Artificial Intelligence

Publish: 2025-10-16 08:24:44 UTC

2510.14420

#1 Instructions are all you need: Self-supervised Reinforcement Learning for Instruction Following [PDF10] [Copy] [Kimi9] [REL]

#1 Instructions are all you need: Self-supervised Reinforcement Learning for Instruction Following [PDF¹⁰] [Copy] [Kimi⁹] [REL]