Zamba2 (Hybrid SSM/Transformer Suite)

Zyphra's flagship architectural family. Zamba (May 2024) introduced a compact 7B hybrid combining a Mamba SSM backbone with a single shared attention module, capturing most of the benefit of attention at minimal parameter cost. Zamba2 (November 2024) extends this to a 1.2B / 2.7B / 7.4B suite trained on up to 3T tokens, with several improvements: Mamba1 → Mamba2 backbone, two alternating shared attention blocks (instead of one) with LoRA projectors for depth specialization, and rotary position embeddings.

Because attention is shared, Zamba2 stores KV caches only at the (~1:6) attention positions — ~6x smaller KV footprint than a pure Transformer of similar quality, with correspondingly lower inference latency and memory cost. Open weights (Apache 2.0). Note: not currently scored on Artificial Analysis; benchmark claims are self-reported.

Zamba2 Technical Report (arXiv)Zamba paper (arXiv)HuggingFace GitHub

Model Details

Architecture DENSE

Parameters 7.4B

Training tokens 3T

License Apache 2.0

Variants

Name	Parameters	Notes
Zamba-7B	7B	Original 2024 release; single shared attention block; Mamba1 backbone
Zamba2-1.2B	1.2B	—
Zamba2-2.7B	2.7B	—
Zamba2-7B	7.4B	Mamba2 backbone, two alternating shared attention blocks with LoRA, RoPE

Paper

arXiv HTML

foundationalopen-weighthybrid-ssmefficiency

Model Details

Variants

Paper

Related