State-of-the-art multimodal embedding models for visual search including text-to-image, image-to-text, image+prompt-to-image, and text-to-image+text retrieval. Achieves SOTA on 36 multimodal embedding evaluation tasks (MMEB). Trained on the MegaPairs synthetic dataset (26M+ samples).

Outputs 2

BGE-VL (Multimodal Embedding)

model

State-of-the-art multimodal embedding models for visual search. BGE-VL-MLLM improves by 8.1pp over prior SOTA on the CIRCO benchmark. Released under MIT license.

Variants

Name Parameters Notes
BGE-VL-base
BGE-VL-large
BGE-VL-MLLM-S1 Trained on MegaPairs only
BGE-VL-MLLM-S2 Full fine-tuned variant
BGE-VL-v1.5-zs Zero-shot variant
BGE-VL-v1.5-mmeb MMEB fine-tuned variant

MegaPairs

dataset

Large-scale synthetic dataset of 26M+ multimodal retrieval triplets. Accepted as ACL 2025 Oral. Released under MIT license.

embeddingsmultimodalopen-weightevaluation

Related