Nemotron 3 Ultra
modelThe largest model in the Nemotron 3 family (Nano / Super / Ultra), shipped on HuggingFace June 4, 2026 after a Computex Taipei pre-announcement by Jensen Huang on June 1. 550B total parameters with 55B active per token (~90% sparsity), trained on 20T text tokens in NVFP4 on Blackwell.
Architecture: hybrid Mamba-2 / Attention Mixture-of-Experts with LatentMoE (hardware-aware expert design with a 2,048-dim latent compression), 108 total layers, 512 experts per layer activated top-22, 64 query / 2 KV heads (GQA), and Multi-Token Prediction (MTP) with 2 shared-weight heads for native speculative decoding. Context window 1M tokens after a long-context extension phase. Post-trained with SFT + multi-environment RLVR + Multi-teacher On-Policy Distillation (MOPD), with explicit reasoning-budget control.
Throughput: 5.9× / 4.8× / 1.6× higher inference throughput than GLM-5.1-754B-A40B, Kimi-K2.6-1T-A32B, and Qwen-3.5-397B-17B respectively on the 8K-input / 64K-output setting, at on-par accuracy across agentic and reasoning benchmarks. AA frames Ultra as the leading US open-weights model on its composite at launch.
Headline benchmarks (BF16, post-trained): MMLU-Pro 86.8, GPQA (no tools) 87.0, LiveCodeBench v6 89.0, SWE-Bench Verified 71.9, Terminal-Bench 2.1 56.4, RULER @ 1M 94.7, AA Intelligence Index v4.1 = 38 served at 300+ tokens/s.
Released as four checkpoints under the Linux Foundation OpenMDW-1.1 license: Base-BF16 (pretrained-only), BF16 (post-trained), NVFP4 (quantized for faster inference), and GenRM (the generative reward model used during RL). Distribution targets: HuggingFace, ModelScope, OpenRouter, and build.nvidia.com.
Companion datasets shipped on HuggingFace 2026-06-04/05: Nemotron-Pretraining-Code-v3 (173B tokens of fresh code with Sept-2025 cutoff), Nemotron-Pretraining-Legal-v1, Nemotron-Pretraining-Specialized-v1.2 (factual recall + moral scenarios), Nemotron-Posttraining-v3, Nemotron-SFT-SWE-v3, Nemotron-RL-Ultra-Training-Blends, Nemotron-RL-Science-v1, Nemotron-RL-Multichallenge-v1, Nemotron-RL-CFBench-v1, Nemotron-RL-SysBench-v1, Nemotron-RL-InverseIFEval-v1, Nemotron-RL-Instruction-Following-Structured-Outputs-v2, plus Nemotron-Personas-Vietnam and Nemotron-Personas-El-Salvador.
Model Details
Benchmark Scores
| Benchmark | Score | Mode |
|---|---|---|
| MMLU-Pro | 86.8 | — |
| GPQA (no tools) | 87.0 | — |
| LiveCodeBench v6 | 89.0 | — |
| SWE-Bench Verified | 71.9 | — |
| Terminal-Bench 2.1 | 56.4 | — |
| RULER @ 1M | 94.7 | — |
Variants
| Name | Parameters | Notes |
|---|---|---|
| Nemotron 3 Ultra 550B-A55B BF16 | 550B | Post-trained flagship; BF16 weights |
| Nemotron 3 Ultra 550B-A55B NVFP4 | 550B | NVFP4-quantized for higher inference throughput on Blackwell |
| Nemotron 3 Ultra 550B-A55B Base BF16 | 550B | Pretrained-only base checkpoint |
| Nemotron 3 Ultra 550B-A55B GenRM | 550B | Generative reward model used during RL post-training |