Aquila-VL-2B
model2B-parameter vision-language model built on LLaVA-one-vision framework with Qwen2.5-1.5B-instruct as LLM and SigLIP-SO400M as vision tower. Trained on the Infinity-MM dataset (~40M image-text pairs). First model to earn LF AI & Data's MOF Class I "Open Science" rating. SOTA performance among models of the same scale on MMBench, RealWorldQA, and ScienceQA benchmarks.
Model Details
Variants
| Name | Parameters | Notes |
|---|---|---|
| Aquila-VL-2B-llava-qwen | — | — |
| Aquila-VL-2B-Intermediate | — | — |