Nemotron 3 Nano | Lab Index

First Nemotron 3 generation models. Hybrid LatentMoE + Mamba-2 + Attention architecture. The 30B-A3B variant has 23 Mamba-2 layers + 23 MoE layers (128 routed + 1 shared expert, 6 active) + 6 GQA attention layers. 30B total / 3.5B active. Trained on 25T tokens. 1M context window.

AIME25: 89.1 (no tools) / 99.2 (with tools), GPQA: 75.0 (tools), LiveCodeBench: 68.3, MMLU-Pro: 78.3, RULER@1M: 86.3. The 4B variant (compressed from 9B via structured pruning) fits on 8GB Jetson Orin Nano for edge deployment. MATH-500: 95.4 in reasoning mode.

Paper (arXiv)HuggingFace (30B)HuggingFace (4B)Artificial Analysis OpenRouter

Model Details

Architecture MOE

Parameters 30B

Active params 3.5B

Context window 1,000,000

Training tokens 25T

AA Intelligence 24

Variants

Name	Parameters	Notes
Nemotron 3 Nano 30B-A3B	30B	—
Nemotron 3 Nano 4B	4B	Edge deployment, 262K context

Paper

arXiv HTML

moeopen-weightreasoningefficiency

Model Details

Variants

Paper

Related