Yuan 2.0-M32 | Lab Index

MoE model with 32 experts (2 active), 40B total / 3.7B active parameters. Introduced "Attention Router" for expert selection, achieving 3.8% accuracy improvement over classical routers. Surpassed Llama3-70B on MATH and ARC-Challenge while requiring 1/19th the compute. Trained on 2T tokens.

Paper (arXiv)GitHub HuggingFace

Model Details

Architecture MOE

Parameters 40B

Active params 3.7B

Paper

arXiv: 2405.17976

moeopen-weightefficiency

Model Details

Paper

Related