Scaled MoE architecture to 141B total / 39B active per token (8 experts, top-2 routing). 64K context. MMLU: 77.3%. Outperforms Command R+ and Llama 2 70B. Apache 2.0.

Model Details

Architecture MOE
Parameters 141B
Active params 39B
Context window 64,000
moeopen-weightfrontier

Related