Mercury (Diffusion LLM)
model"Mercury: Ultra-Fast Language Models Based on Diffusion." Introduces diffusion-based LLMs (dLLMs) that forecast multiple tokens simultaneously via iterative denoising, rather than sequential autoregressive generation. Mercury Coder Mini achieves 1,109 tokens/sec on H100.
Mercury 2 (February 2026) adds reasoning capability with AA Intelligence Index 33 at ~929 tok/s — roughly 10x faster than comparable autoregressive models at similar quality. A genuinely novel architecture paradigm. By Khanna, Kharbanda, Li, Ermon, Grover, Kuleshov et al.
Model Details
Context window 128,000
Variants
| Name | Parameters | Notes |
|---|---|---|
| Mercury Coder | — | — |
| Mercury 2 | — | Reasoning, AA index 33, 929 tok/s |
Paper
arXiv: 2506.17298