MiMo-V2-Flash

309B MoE model (15B active) optimized for extreme inference speed (150 tokens/sec) and low-cost API access. Introduces a hybrid attention architecture interleaving Sliding Window Attention with global attention (5:1 ratio), achieving nearly 6x reduction in KV-cache storage. Pre-trained on 27 trillion tokens with Multi-Token Prediction. Uses Multi-Teacher On-Policy Distillation (MOPD) for efficient post-training. Rivals DeepSeek-V3.2 and Kimi-K2 while using 1/2 to 1/3 of their parameters.

GitHub HuggingFace Paper (arXiv)Artificial Analysis

Outputs 2

model

HuggingFace GitHub

Architecture MOE

Parameters 309B

Active params 15B

Training tokens 27T

AA Intelligence 41

MiMo-V2-Flash: Unlocking Extreme Inference Efficiency

paper 2026-01-06

Technical report detailing hybrid SWA + global attention, Multi-Token Prediction, and Multi-Teacher On-Policy Distillation (MOPD).

Paper (arXiv)

arXiv HTML

moeefficiencyopen-weight

Outputs 2

MiMo-V2-Flash

MiMo-V2-Flash: Unlocking Extreme Inference Efficiency

Related