72B MoE model with a novel Mixture of Grouped Experts (MoGE) architecture, activating 16B parameters. Open-sourced as part of the openPangu initiative.

Paper

moeopen-weight

Related