Family of reasoning models derived from Meta Llama via Neural Architecture Search (NAS): Ultra (253B, from Llama 3.1 405B with skip attention, variable FFN, FFN fusion), Super (49B, from Llama 3.3 70B), and Nano (8B). First open models with dynamic reasoning toggle (on/off at inference).

Ultra reasoning ON: MATH-500 97.0, GPQA 76.0, AIME25 72.5, LiveCodeBench 66.3. Outperforms DeepSeek-R1 on GPQA at less than half the parameters. Super fits on single H100-80GB. v1.5 adds RPO, RLVR, and iterative DPO for enhanced agentic capabilities.

Model Details

Architecture DENSE
Context window 128,000
AA Intelligence 15
Base model llama-3.1

Variants

Name Parameters Notes
Llama-3.1-Nemotron-Ultra-253B 253B, derived from Llama 3.1 405B via NAS
Llama-3.3-Nemotron-Super-49B 49B, derived from Llama 3.3 70B via NAS
Llama-3.1-Nemotron-Nano-8B 8B

Paper

open-weightreasoningfrontier

Related