Pangu Ultra
model paper135B dense LLM trained entirely on Ascend NPUs (8,192 chips) on 13.2T tokens. Demonstrates that frontier-scale dense models can be trained on domestic Chinese hardware without NVIDIA GPUs.
Outputs 2
Pangu Ultra 135B
model Architecture DENSE
Parameters 135B
Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs
paperarXiv: 2504.07866