MiMo-VL
model paperOpen-source vision-language model that outperformed models ten times its size (like Qwen-72B) in multimodal reasoning. Includes SFT and RL variants.
Outputs 2
MiMo-VL: From Pre-training to Post-training
paperTechnical report on achieving SOTA multimodal reasoning at the 7B scale.
arXiv: 2506.03569