Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs | Cool Papers

#1 Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs [PDF³] [Copy] [Kimi⁵] [REL]

Authors: Chris Yuhao Liu ; Liang Zeng ; Jiacai Liu ; Rui Yan ; Jujie He ; Chaojie Wang ; Shuicheng Yan ; Yang Liu ; Yahui Zhou

In this report, we introduce a collection of methods to enhance reward modeling for LLMs, focusing specifically on data-centric techniques. We propose effective data selection and filtering strategies for curating high-quality open-source preference datasets, culminating in the Skywork-Reward data collection, which contains only 80K preference pairs -- significantly smaller than existing datasets. Using this curated dataset, we developed the Skywork-Reward model series -- Skywork-Reward-Gemma-27B and Skywork-Reward-Llama-3.1-8B -- with the former currently holding the top position on the RewardBench leaderboard. Notably, our techniques and datasets have directly enhanced the performance of many top-ranked models on RewardBench, highlighting the practical impact of our contributions in real-world preference learning applications.

Subjects: Artificial Intelligence ; Computation and Language

Publish: 2024-10-24 06:06:26 UTC

2410.18451

#1 Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs [PDF3] [Copy] [Kimi5] [REL]

#1 Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs [PDF³] [Copy] [Kimi⁵] [REL]