ZR1-1.5B | Lab Index

Compact reasoning model post-trained from DeepSeek-R1-Distill-Qwen-1.5B using PRIME (Process Reinforcement through IMplicit rEwards) with token-level RLOO on ~400k math + ~25k code samples (NuminaMath-CoT, APPS, CodeContests, TACO, Codeforces). Generation length ramped 12k → 24k tokens over training.

Self-reported numbers: 88.34% MATH-500, 37.91% GPQA-Diamond, ~40% LeetCode — >50% improvement over the R1-Distill base at the same parameter count. An applied study in token-efficient reasoning at small scale. Not currently scored on Artificial Analysis.

HuggingFace Zyphra blog

Model Details

Architecture DENSE

Parameters 1.5B

Base model deepseek-r1

Benchmark Scores

Benchmark	Score	Mode
MATH-500	88.34%	—
GPQA Diamond	37.91%	—
LeetCode	40%	—

reasoningopen-weight

Model Details

Benchmark Scores

Related