Zyphra's reasoning successor to Zamba2: an 8B-total / 700M-active MoE trained end-to-end on a full-stack AMD platform — pretrain, midtrain, and SFT all on AMD Instinct MI300 GPUs with AMD networking and software. To Zyphra's knowledge this is the largest publicly released foundation model trained entirely on AMD silicon, and the companion paper documents the systems-level co-design.

Reported benchmarks (with their Markovian RSA test-time compute method): 91.9% AIME'25, 89.6% HMMT'25. Zyphra claims it matches or exceeds DeepSeek-R1-0528 on math and coding despite having under 1B active parameters. Not currently scored on Artificial Analysis — numbers above are self-reported from the technical report.

Model Details

Architecture MOE
Parameters 8B
Active params 700M
Training hardware AMD Instinct MI300

Benchmark Scores

Benchmark Score Mode
AIME 2025 91.9% with Markovian RSA
HMMT 2025 89.6% with Markovian RSA

Paper

reasoningopen-weightmoeamd

Related