"Mercury: Ultra-Fast Language Models Based on Diffusion." Introduces diffusion-based LLMs (dLLMs) that forecast multiple tokens simultaneously via iterative denoising, rather than sequential autoregressive generation. Mercury Coder Mini achieves 1,109 tokens/sec on H100.

Mercury 2 (February 2026) adds reasoning capability with AA Intelligence Index 33 at ~929 tok/s — roughly 10x faster than comparable autoregressive models at similar quality. A genuinely novel architecture paradigm. By Khanna, Kharbanda, Li, Ermon, Grover, Kuleshov et al.

Model Details

Context window 128,000

Variants

Name Parameters Notes
Mercury Coder
Mercury 2 Reasoning, AA index 33, 929 tok/s

Paper

arXiv: 2506.17298

foundationalreasoningefficiency