119B total / 6.5B active MoE (128 experts, top-4 routing). First Mistral model to unify instruct, reasoning, multimodal, and agentic coding in one architecture. 256K context. 3x throughput vs Mistral Small 3 with 40% latency reduction. Configurable reasoning effort.

AA Intelligence Index: 12 (non-reasoning). Apache 2.0.

May 31 2026: NVIDIA-collaboration NVFP4 quantized variant shipped on HuggingFace (Mistral-Small-4-119B-2603-NVFP4) for faster Blackwell inference at near-BF16 quality.

Model Details

Architecture MOE
Parameters 119B
Active params 6.5B
Context window 256,000
AA Intelligence 12
moeopen-weightreasoningmultimodalagentic

Related