CPM-2
modelA large-scale cost-effective pre-trained language model series (up to 198B parameters) that introduced Mixture-of-Experts (MoE) and multilingual capabilities. Supported by BAAI, it achieved state-of-the-art results on both Chinese and English tasks while maintaining computational efficiency.
Model Details
Architecture MOE
Parameters 198B
Paper
arXiv: 2106.10715