Modular post-training recipe that trains independent domain experts separately then merges them into a Mixture-of-Experts model. Each expert is a two-expert MoE with one frozen "anchor" (preserving base weights) and one trainable expert, using a progressive unfreezing schedule across mid-training, SFT, and RLVR stages. After training, diverged shared parameters are averaged across experts (with minimal loss), and a lightweight router is trained on 5% of SFT data.

Key benefit: individual experts can be upgraded or replaced without retraining the full model. Achieves 49.1 average across 19 benchmarks (7 categories), outperforming monolithic post-training baselines (47.8) with +7.8 on math and +4.7 on code. Full retraining ceiling is 50.5. Built on FlexOlmo. By Morrison, Adhikesaven, Bhagia, Zaharia, Smith, and Min (Ai2).

foundationalmoepost-trainingefficiency

Related