Scaling Reasoning Tokens via RL and Parallel Thinking

"Evidence From Competitive Programming." Empirically demonstrates a log-linear relationship between accuracy and the number of reasoning tokens generated during test-time compute scaling, validated on competitive programming benchmarks.

Combines reinforcement learning with parallel thinking to push the reasoning frontier. By Zhang, Guo, Ren, Chen, Ding, Xin, and Xiao (ByteDance Seed + Princeton, UC Berkeley, Stanford).

Paper (arXiv)

Paper

arXiv HTML

reasoningreinforcement-learningresearch