InternVL 1.5
modelIntroduced dynamic high-resolution processing (up to 4K via 1-40 tiles of 448x448), continuous learning for InternViT-6B, and high-quality bilingual training data. SOTA on 8 of 18 benchmarks, closing the gap to GPT-4V.
Model Details
Architecture DENSE
Parameters 26B
Paper
arXiv: 2404.16821