Yuan 3.0 Flash
modelOpen-source multimodal MoE LLM (40B total / 3.7B active) for enterprise applications. Introduced Reflection-aware Adaptive Policy Optimization (RAPO) and Reflection Inhibition Reward Mechanism (RIRM) to prevent overthinking, cutting inference costs by ~50%. Designed for "intelligence too cheap to meter."
Model Details
Architecture MOE
Parameters 40B
Active params 3.7B
Paper
arXiv: 2601.01718