daVinci-LLM
model"The Science of Pretraining." 3B model trained on 8T tokens with 200+ controlled ablations. Scores 51.72 overall, matching OLMo-3 7B at half the parameters. MATH: 62.80 vs OLMo-3's 39.60.
Introduces the Data Darwinism framework for systematic data processing (L0-L9 taxonomy) and a two-stage adaptive curriculum. Full training trajectory released (logs, checkpoints, data mixtures).
Model Details
Architecture DENSE
Parameters 3B
Paper
arXiv: 2603.27164