Llama-Nemotron (Nano/Super/Ultra)
modelFamily of reasoning models derived from Meta Llama via Neural Architecture Search (NAS): Ultra (253B, from Llama 3.1 405B with skip attention, variable FFN, FFN fusion), Super (49B, from Llama 3.3 70B), and Nano (8B). First open models with dynamic reasoning toggle (on/off at inference).
Ultra reasoning ON: MATH-500 97.0, GPQA 76.0, AIME25 72.5, LiveCodeBench 66.3. Outperforms DeepSeek-R1 on GPQA at less than half the parameters. Super fits on single H100-80GB. v1.5 adds RPO, RLVR, and iterative DPO for enhanced agentic capabilities.
Paper (arXiv)HuggingFace (Ultra 253B)HuggingFace (Super 49B)Artificial Analysis (Ultra)Artificial Analysis (Super)OpenRouter (Ultra)OpenRouter (Super)
Model Details
Architecture DENSE
Parameters 253B
Context window 128,000
Variants
| Name | Parameters | Notes |
|---|---|---|
| Llama-3.1-Nemotron-Ultra-253B | 253B | — |
| Llama-3.3-Nemotron-Super-49B | 49B | — |
| Llama-3.1-Nemotron-Nano-8B | 8B | — |
Paper
arXiv: 2505.00949