Nemotron 3 Nano
modelFirst Nemotron 3 generation models. Hybrid LatentMoE + Mamba-2 + Attention architecture. The 30B-A3B variant has 23 Mamba-2 layers + 23 MoE layers (128 routed + 1 shared expert, 6 active) + 6 GQA attention layers. 30B total / 3.5B active. Trained on 25T tokens. 1M context window.
AIME25: 89.1 (no tools) / 99.2 (with tools), GPQA: 75.0 (tools), LiveCodeBench: 68.3, MMLU-Pro: 78.3, RULER@1M: 86.3. The 4B variant (compressed from 9B via structured pruning) fits on 8GB Jetson Orin Nano for edge deployment. MATH-500: 95.4 in reasoning mode.
Model Details
Architecture MOE
Parameters 30B
Active params 3.5B
Context window 1,000,000
Variants
| Name | Parameters | Notes |
|---|---|---|
| Nemotron 3 Nano 30B-A3B | 30B | — |
| Nemotron 3 Nano 4B | 4B | Edge deployment, 262K context |
Paper
arXiv: 2512.20856