Mistral Large 2
model123B dense model with 128K context. Performs on par with GPT-4o, Claude 3 Opus, and Llama 3 405B. MMLU: 84.0%. Supports 80+ programming languages and dozens of natural languages.
Model Details
Architecture DENSE
Parameters 123B
Context window 128,000