Generalist reward model with inference-time scaling.

Paper

Citations 1
reasoningreward-model