"Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data." Open multimodal instruction dataset with 40M+ samples — 10M image descriptions plus 24.4M visual instruction examples. Introduces a synthetic-instruction generation pipeline built on a tagging system and open-source VLMs to enable continuous, large-scale expansion of high-quality data.

Used to train Aquila-VL-2B, which reached SOTA among 2B-class VLMs on MMBench, RealWorldQA, and ScienceQA. Released CC-BY-SA-4.0.

Paper

Dataset

Size 40M+ samples
Format image-text
License CC-BY-SA-4.0
training-datatrainingmultimodal

Related