Llama 4
modelFirst MoE Llama. Scout: 109B total / 17B active (16 experts), 10M token context. Maverick: 400B total / 17B active (128 experts), 1M context. Both natively multimodal (text + image). Trained with early-fusion architecture.
Llama 4 was also the first multimodal Llama with built-in vision capabilities. Behemoth (~2T/288B active) was announced but delayed. AA Intelligence Index: 18 (Maverick), 14 (Scout). Llama 4 Community License.
Model Details
Architecture MOE
Parameters 400B
Active params 17B
Context window 1,000,000
Variants
| Name | Parameters | Notes |
|---|---|---|
| Llama 4 Scout | 109B | 16 experts, 10M context |
| Llama 4 Maverick | 400B | 128 experts, 1M context |