Seed1.5-VL
modelVision-language foundation model with a 532M vision encoder and 20B-active MoE LLM. State-of-the-art on 38 of 60 public VLM benchmarks. Excels at GUI control, gameplay, and visual reasoning tasks.
Model Details
Architecture MOE
Parameters 200B
Active params 20B
Paper
arXiv: 2505.07062