Mixtral 8x22B | Lab Index

Scaled MoE architecture to 141B total / 39B active per token (8 experts, top-2 routing). 64K context. MMLU: 77.3%. Outperforms Command R+ and Llama 2 70B. Apache 2.0.

HuggingFace

Model Details

Architecture MOE

Parameters 141B

Active params 39B

Context window 64,000

moeopen-weightfrontier

Model Details

Related