Mistral's largest model. 675B total / 41B active MoE with granular mixture-of-experts architecture, plus 2.5B vision encoder for native multimodal. 256K context. Trained on 3,000 H200 GPUs.

MMLU-Pro: 73.11%, MATH-500: 93.60%. Competitive with frontier models across reasoning, coding, and multilingual tasks. Apache 2.0.

Model Details

Architecture MOE
Parameters 675B
Active params 41B
Context window 256,000
moeopen-weightfrontiermultimodal

Related