Open-source vision-language model with Moonlight MoE LLM (2.8B active / 16B total) and 400M MoonViT encoder. Kimi-VL-Thinking achieves 64.0 on MMMU.

Outputs 2

Kimi-VL-A3B-Thinking

model
Architecture MOE
Parameters 16B
Active params 2.8B

Kimi-VL Technical Report

paper

arXiv: 2504.07491

multimodalmoeopen-weight