Mistral Small 4
model119B total / 6.5B active MoE (128 experts, top-4 routing). First Mistral model to unify instruct, reasoning, multimodal, and agentic coding in one architecture. 256K context. 3x throughput vs Mistral Small 3 with 40% latency reduction. Configurable reasoning effort.
AA Intelligence Index: 12 (non-reasoning). Apache 2.0.
May 31 2026: NVIDIA-collaboration NVFP4 quantized variant shipped on HuggingFace (Mistral-Small-4-119B-2603-NVFP4) for faster Blackwell inference at near-BF16 quality.
Model Details
Architecture MOE
Parameters 119B
Active params 6.5B
Context window 256,000
AA Intelligence 12