Language Model Ladders
paperDevelops task-specific scaling laws using small "ladder" models (costing 1% of target compute) to predict large model performance within 2 percentage points. Two-step prediction: model/data size to task loss, then task loss to task performance. Validated on OLMo 2 7B and 13B.
Paper
arXiv: 2412.04403