POLAR
paperPolicy Discriminative Learning (POLAR) — a pre-training approach that frames reward modeling as distinguishing between different policies. Achieves significant improvements in preference accuracy across tasks with predictable compute-performance scaling laws.
Paper
arXiv: 2507.05197