Falcon-H1R
model7B hybrid reasoning model (Transformer + Mamba-2 parallel heads). Trained via SFT on 3.1M samples + GRPO RL. 256K context, up to 48K response tokens. Introduces DeepConf@512 test-time scaling: AIME25 reaches 96.7% with 38% fewer tokens.
AIME24: 88.1%, AIME25: 83.1%, MATH500: 97.4%, LiveCodeBench-v6: 68.6%, GPQA-Diamond: 61.3%. AA Intelligence Index: 16. CC BY 4.0.
Model Details
Architecture DENSE
Parameters 7B
Context window 256,000
Paper
arXiv: 2601.02346