AI Lab Tracker
Labs
Timeline
What's New
SRPO: Staged History-Resampling Policy Optimization
paper
2025-04-19
Kuaishou
Two-stage reinforcement learning framework for LLMs that surpasses DeepSeek-R1-Zero-32B on AIME24 and LiveCodeBench with only 1/10 of the training steps.
Paper (arXiv)
Paper
arXiv
HTML
reasoning
training
efficiency
Related
kat