Inspur

public

china · Founded 2000 · IPO 2000 (SZSE: 000977) · Market cap: $12B (SZ: 000977) as of 2026-04

YuanLab AI is the AI research division of Inspur (SZ: 000977), one of the world's largest server manufacturers. Originally the Inspur AI Research Institute, it rebranded as YuanLab around 2024–2025 to reflect its shift toward open-source enterprise-grade foundation models. Unlike VC-backed startups, YuanLab is funded directly by Inspur's corporate capital and has a "home field advantage" in hardware optimization from building the servers many other AI companies train on.

The Yuan model series traces a clear arc from massive dense models to efficient MoE. Yuan 1.0 (2021, 245B dense) was the world's largest single-language Chinese model at launch. Yuan 2.0 (2023) introduced Localized Filtering-based Attention (LFA) across 2B–102B sizes. Yuan 2.0-M32 (2024) marked the MoE pivot with an Attention Router achieving 3.8% accuracy gains over classical expert routing. The current flagship Yuan 3.0 Ultra (March 2026, 1.01T total / 68.8B active) was pruned from 1.5T parameters using Layer-Adaptive Expert Pruning (LAEP), improving training efficiency by 49% and inference speed by 33%. Yuan 3.0 Flash (Dec 2025, 40B MoE) introduced the Reflection Inhibition Reward Mechanism (RIRM) to prevent overthinking, cutting inference costs by ~50%.

Website Wikipedia HuggingFace GitHub

open-weightmoeenterpriseinfrastructure

People

Shaohua Wu (Shawn Wu) — Lead Researcher

Outputs (6)

Yuan 3.0 Ultra

model

2026-01-20

Trillion-parameter enterprise MoE flagship (1.01T total / 68.8B active). Originally 1.5T parameters, pruned using Layer-Adaptive Expert Pruning (LAEP)…

Yuan 3.0 Flash

model

2026-01-05

Open-source multimodal MoE LLM (40B total / 3.7B active) for enterprise applications. Introduced Reflection-aware Adaptive Policy Optimization (RAPO)…

Yuan Embedding

model

2024-06-01

Embedding model series for retrieval and RAG. Yuan-embedding 1.0 (0.3B) for Chinese, Yuan-embedding 2.0 in Chinese (0.3B) and English (0.6B) variants.

Yuan 2.0-M32

model

2024-05-28

MoE model with 32 experts (2 active), 40B total / 3.7B active parameters. Introduced "Attention Router" for expert selection, achieving 3.8% accuracy…

Yuan 2.0

model

2023-11-27

LLM series (2.1B, 51B, 102.6B) introducing Localized Filtering-based Attention (LFA) to incorporate local dependency priors. Strong capabilities in co…

Yuan 1.0

model

2021-10-10

At launch, the world's largest single-language (Chinese) model at 245B parameters (dense). Trained on 5TB of high-quality Chinese text. Passed a Turin…