Advances open-source multimodal models in versatility, reasoning, and efficiency. Introduces Cascade Reinforcement Learning (offline + online RL), Visual Resolution Router for dynamic token adjustment, and Decoupled Vision-Language Deployment. Up to 16% performance gain and 4x inference speedup over InternVL3. Includes native vision-language-action capabilities.

Model Details

Architecture MOE

Variants

Name Parameters Notes
InternVL3_5-1B 1B
InternVL3_5-8B 8B
InternVL3_5-38B 38B
InternVL3_5-241B-A28B 241B MoE architecture

Paper

arXiv: 2508.18265

multimodalopen-weightvisionfrontierreasoningmoe

Related