daVinci-LLM | Lab Index

"The Science of Pretraining." 3B model trained on 8T tokens with 200+ controlled ablations. Scores 51.72 overall, matching OLMo-3 7B at half the parameters. MATH: 62.80 vs OLMo-3's 39.60.

Introduces the Data Darwinism framework for systematic data processing (L0-L9 taxonomy) and a two-stage adaptive curriculum. Full training trajectory released (logs, checkpoints, data mixtures).

No results found