Zyphra's flagship architectural family. Zamba (May 2024) introduced a compact 7B hybrid combining a Mamba SSM backbone with a single shared attention module, capturing most of the benefit of attention at minimal parameter cost. Zamba2 (November 2024) extends this to a 1.2B / 2.7B / 7.4B suite trained on up to 3T tokens, with several improvements: Mamba1 → Mamba2 backbone, two alternating shared attention blocks (instead of one) with LoRA projectors for depth specialization, and rotary position embeddings.

Because attention is shared, Zamba2 stores KV caches only at the (~1:6) attention positions — ~6x smaller KV footprint than a pure Transformer of similar quality, with correspondingly lower inference latency and memory cost. Open weights (Apache 2.0). Note: not currently scored on Artificial Analysis; benchmark claims are self-reported.

Model Details

Architecture DENSE
Parameters 7.4B
Training tokens 3T
License Apache 2.0

Variants

Name Parameters Notes
Zamba-7B 7B Original 2024 release; single shared attention block; Mamba1 backbone
Zamba2-1.2B 1.2B
Zamba2-2.7B 2.7B
Zamba2-7B 7.4B Mamba2 backbone, two alternating shared attention blocks with LoRA, RoPE

Paper

foundationalopen-weighthybrid-ssmefficiency

Related