Jamba
modelFirst production-grade SSM-Transformer-MoE hybrid architecture. Interleaves Mamba state-space layers with Transformer attention layers, combined with mixture-of-experts routing. 256K token context. Fits on a single 80GB GPU despite its scale.
The hybrid architecture achieves 2.5x faster inference than equivalent dense Transformers. Jamba 1.7 Large (398B total / 94B active) is the current variant. AA Intelligence Index: 11. The Jamba architecture is a genuine contribution to the field — demonstrating that SSMs and attention are complementary, not competing. Apache 2.0 (Jamba v0.1). By Lieber, Lenz, Shoham et al.
Model Details
Architecture MOE
Parameters 398B
Active params 94B
Context window 256,000
Variants
| Name | Parameters | Notes |
|---|---|---|
| Jamba v0.1 | 52B | Original release, Apache 2.0 |
| Jamba 1.5 Mini | 52B | — |
| Jamba 1.7 Large | 398B | — |
Paper
arXiv: 2403.19887