InternVL3
modelIntroduces native joint multimodal pre-training, variable visual position encoding, mixed preference optimization, and test-time scaling. InternVL3-78B achieves 72.2 on MMMU, competitive with GPT-4o and Claude 3.5 Sonnet.
Model Details
Architecture DENSE
Variants
| Name | Parameters | Notes |
|---|---|---|
| InternVL3-1B | 1B | — |
| InternVL3-8B | 8B | — |
| InternVL3-38B | 38B | — |
| InternVL3-78B | 78B | — |
Paper
arXiv: 2504.10479