Trillion-parameter sparse language model (1.085T) extending PanGu-alpha with Random Routed Experts (RRE). Trained on 329B tokens in 40+ languages on 512 Ascend 910 accelerators.

Outputs 2

PanGu-Sigma

model
Architecture MOE
Parameters 1.1T
Training tokens 329B

PanGu-Sigma: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing

paper
Citations 7
nlpmoe

Related