Mixtral 8x22B
modelScaled MoE architecture to 141B total / 39B active per token (8 experts, top-2 routing). 64K context. MMLU: 77.3%. Outperforms Command R+ and Llama 2 70B. Apache 2.0.
Model Details
Architecture MOE
Parameters 141B
Active params 39B
Context window 64,000