Mistral 7B | Lab Index

Mistral's debut model and a landmark in efficient open-weight LLMs. 7.3B dense parameters with two key innovations: Grouped-Query Attention (GQA) for faster inference and Sliding Window Attention (SWA) for efficient long-context handling. 32K context.

Outperformed Llama 2 13B on all benchmarks and Llama 1 34B on reasoning, math, and code. MMLU: 60.1%, HellaSwag: 84.0%. Apache 2.0. Spawned an enormous ecosystem of fine-tunes and derivatives across the open-source community.

Paper (arXiv)HuggingFace GitHub

Model Details

Architecture DENSE

Parameters 7.3B

Context window 32,000

Paper

Citations 288

arXiv HTML

open-weightefficiency

Model Details

Paper

Related