MoE flagship with 236B total / 23B active parameters. Uses Multi-Token Prediction (MTP) for 1.5x faster generation through self-drafting.

Model Details

Architecture MOE
Parameters 236B
Active params 23B
AA Intelligence 32

Paper

moeopen-weightefficiency

Related