Predictable Scale Part II: Farseer

Refined scaling law for predicting LLM training loss across scales. Introduces a loss surface function L(N,D) that reduces extrapolation error by 433% vs Chinchilla. Trained ~1,000 models using ~3M GPU hours with full open-source release.

No results found