IBM's latest foundation model family: 3B, 8B, and 30B dense decoder-only Transformers (GQA, RoPE, SwiGLU, RMSNorm) trained on ~15T tokens with multi-stage pretraining and long-context extension to 512K tokens. Post-trained with SFT on ~4.1M curated samples and RL via on-policy GRPO with DAPO loss. Granite 4.1 8B-Instruct matches or outperforms the previous Granite 4.0 32B MoE despite being a simpler dense model — demonstrating that training quality can substitute for scale.

Also includes Granite Speech 4.1 (ASR + translation), Granite Vision 4.1 (table/chart extraction), Granite Guardian (harm detection), and embedding models. 8B benchmarks: MMLU 73.84, BBH 80.51, GSM8K 92.49, HumanEval 85.37, MBPP 87.30, ArenaHard 68.98. All models under Apache 2.0.

Model Details

Architecture DENSE
Parameters 30B
Context window 512,000
Training tokens 15T
License Apache 2.0

Benchmark Scores

Benchmark Score Mode
MMLU 73.84 5-shot (8B)
BBH 80.51 3-shot CoT (8B)
GSM8K 92.49 8-shot (8B)
HumanEval 85.37 pass@1 (8B)
MBPP 87.30 pass@1 (8B)
ArenaHard 68.98 8B

Variants

Name Parameters Notes
Granite 4.1 3B 3B
Granite 4.1 8B 8B
Granite 4.1 30B 30B
Granite Vision 4.1 4B 4B Vision model for table/chart extraction
Granite Speech 4.1 2B 2B ASR with translation
open-weightenterprise

Related