PanGu-Sigma
model paperTrillion-parameter sparse language model (1.085T) extending PanGu-alpha with Random Routed Experts (RRE). Trained on 329B tokens in 40+ languages on 512 Ascend 910 accelerators.
Outputs 2
PanGu-Sigma
model Architecture MOE
Parameters 1.1T
PanGu-Sigma: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
paperarXiv: 2303.10845