Mercury (Diffusion LLM)

"Mercury: Ultra-Fast Language Models Based on Diffusion." Introduces diffusion-based LLMs (dLLMs) that forecast multiple tokens simultaneously via iterative denoising, rather than sequential autoregressive generation. Mercury Coder Mini achieves 1,109 tokens/sec on H100.

Mercury 2 (February 2026) adds reasoning capability with AA Intelligence Index 33 at ~929 tok/s — roughly 10x faster than comparable autoregressive models at similar quality. A genuinely novel architecture paradigm. By Khanna, Kharbanda, Li, Ermon, Grover, Kuleshov et al.

Paper (arXiv)Artificial Analysis (Mercury 2)OpenRouter

Model Details

Context window 128,000

AA Intelligence 33

Variants

Name	Parameters	Notes
Mercury Coder	—	—
Mercury 2	—	Reasoning, AA index 33, 929 tok/s

Paper

arXiv HTML

foundationalreasoningefficiency