Family of 1B, 3B, 7B, 10B dense Transformers plus Mamba-7B SSM variant. 7B trained from scratch on 14T tokens (1,024 H100s). 10B created via depth upscaling + 2T additional tokens. 1B/3B via pruning + distillation. MMLU: 73.1 (10B). #1 on HuggingFace Open LLM Leaderboard at launch for size class.

Model Details

Architecture DENSE
Parameters 10B

Variants

Name Parameters Notes
Falcon3-1B 1B
Falcon3-3B 3B
Falcon3-7B 7B
Falcon3-10B 10B
Falcon3-Mamba-7B 7B Pure Mamba SSM
open-weight

Related