Yuan 3.0 Flash | Lab Index

Open-source multimodal MoE LLM (40B total / 3.7B active) for enterprise applications. Introduced Reflection-aware Adaptive Policy Optimization (RAPO) and Reflection Inhibition Reward Mechanism (RIRM) to prevent overthinking, cutting inference costs by ~50%. Designed for "intelligence too cheap to meter."

Paper (arXiv)GitHub HuggingFace

Model Details

Architecture MOE

Parameters 40B

Active params 3.7B

Paper

arXiv HTML

moeopen-weightmultimodalenterpriseefficiency

Model Details

Paper

Related