MoE flagship with 236B total / 23B active parameters. Uses Multi-Token Prediction (MTP) for 1.5x faster generation through self-drafting.

Model Details

Architecture MOE
Parameters 236B
Active params 23B

Paper

arXiv: 2601.01739

moeopen-weightefficiency

Related