InternVL 3.5
modelAdvances open-source multimodal models in versatility, reasoning, and efficiency. Introduces Cascade Reinforcement Learning (offline + online RL), Visual Resolution Router for dynamic token adjustment, and Decoupled Vision-Language Deployment. Up to 16% performance gain and 4x inference speedup over InternVL3. Includes native vision-language-action capabilities.
Model Details
Architecture MOE
Variants
| Name | Parameters | Notes |
|---|---|---|
| InternVL3_5-1B | 1B | — |
| InternVL3_5-8B | 8B | — |
| InternVL3_5-38B | 38B | — |
| InternVL3_5-241B-A28B | 241B | MoE architecture |
Paper
arXiv: 2508.18265