Open-source multimodal MoE LLM (40B total / 3.7B active) for enterprise applications. Introduced Reflection-aware Adaptive Policy Optimization (RAPO) and Reflection Inhibition Reward Mechanism (RIRM) to prevent overthinking, cutting inference costs by ~50%. Designed for "intelligence too cheap to meter."

Model Details

Architecture MOE
Parameters 40B
Active params 3.7B

Paper

arXiv: 2601.01718

moeopen-weightmultimodalenterpriseefficiency

Related