Optimal Batch Size Scheduling via Functional Scaling Laws
paperIntroduces a principled framework for Batch Size Scheduling (BSS) based on functional scaling laws. The paper uncovers the "fast catch-up" effect, showing that for hard tasks, maintaining small batch sizes for most of training and switching to large batches late stage is optimal, substantially reducing data consumption without sacrificing performance.
Paper
arXiv: 2602.14208