Scaling Reasoning Tokens via RL and Parallel Thinking
paper"Evidence From Competitive Programming." Empirically demonstrates a log-linear relationship between accuracy and the number of reasoning tokens generated during test-time compute scaling, validated on competitive programming benchmarks.
Combines reinforcement learning with parallel thinking to push the reasoning frontier. By Zhang, Guo, Ren, Chen, Ding, Xin, and Xiao (ByteDance Seed + Princeton, UC Berkeley, Stanford).