The large-scale successor to ZAYA1-8B: a 74B-total / 4B-active Mixture-of-Experts model trained end-to-end on AMD (MI300X GPUs with 192GB VRAM, Pensando Pollara interconnect), making it one of the largest open foundation models pretrained entirely on AMD silicon. Uses Zyphra's CCA attention variant with interleaved 4K sliding-window and global attention layers.

Pretrained on ~15T tokens across two phases, then three midtraining phases (~1T tokens each) progressively extending context to 256K. Released as a pre-RL reasoning-base checkpoint — it has not undergone instruction/chat tuning or RL post-training, and is positioned as a research artifact for studying high-quality pretraining at scale. Open weights (Apache 2.0). Not currently scored on Artificial Analysis.

Model Details

Architecture MOE
Parameters 74B
Active params 4B
Context window 262,144
Training tokens 15T
Training hardware AMD Instinct MI300X
License Apache 2.0
moeopen-weightreasoningamd

Related