Skywork-Reward
modelReward model series that topped RewardBench with only 80K curated preference pairs. V2 (Jul 2025, ICLR 2026) scales to 8 models (0.6B-8B) trained on SynPref-40M (26M curated pairs), achieving SOTA across 7 reward model benchmarks. CC BY 4.0.
Outputs 2
Skywork-Reward V1
model#1 on RewardBench with 80K preference pairs. Gemma-27B and Llama-8B variants.
arXiv: 2410.18451
Skywork-Reward V2
model8 models (0.6B-8B), SynPref-40M dataset, SOTA on 7 benchmarks. ICLR 2026.
arXiv: 2507.01352
Venue: ICLR 2026