Generalist reward model with inference-time scaling.

Paper

arXiv: 2504.02495

reasoningreward-model