InternVL3 | Lab Index

Introduces native joint multimodal pre-training, variable visual position encoding, mixed preference optimization, and test-time scaling. InternVL3-78B achieves 72.2 on MMMU, competitive with GPT-4o and Claude 3.5 Sonnet.

No results found