ZAYA1-74B-Preview | Lab Index

The large-scale successor to ZAYA1-8B: a 74B-total / 4B-active Mixture-of-Experts model trained end-to-end on AMD (MI300X GPUs with 192GB VRAM, Pensando Pollara interconnect), making it one of the largest open foundation models pretrained entirely on AMD silicon. Uses Zyphra's CCA attention variant with interleaved 4K sliding-window and global attention layers.

Pretrained on ~15T tokens across two phases, then three midtraining phases (~1T tokens each) progressively extending context to 256K. Released as a pre-RL reasoning-base checkpoint — it has not undergone instruction/chat tuning or RL post-training, and is positioned as a research artifact for studying high-quality pretraining at scale. Open weights (Apache 2.0). Not currently scored on Artificial Analysis.

Blog (Zyphra)HuggingFace

Model Details

Architecture MOE

Parameters 74B

Active params 4B

Context window 262,144

Training tokens 15T

Training hardware AMD Instinct MI300X

License Apache 2.0

moeopen-weightreasoningamd

Model Details

Related