Introduces a principled framework for Batch Size Scheduling (BSS) based on functional scaling laws. The paper uncovers the "fast catch-up" effect, showing that for hard tasks, maintaining small batch sizes for most of training and switching to large batches late stage is optimal, substantially reducing data consumption without sacrificing performance.

Paper

arXiv: 2602.14208

scalingefficiencyresearch

Related