Original Falcon series. Dense causal Transformers with multi-query attention, RoPE, FlashAttention. Trained primarily on RefinedWeb (5T token open web corpus). Falcon-180B (3,500B tokens, 4,096 A100s) was the largest open-weight model at launch.

Falcon-7B: 1,500B tokens, 384 A100s. Falcon-40B: 1,000B tokens. All Apache 2.0 (180B under TII License).

Model Details

Architecture DENSE
Parameters 180B
Context window 2,048

Variants

Name Parameters Notes
Falcon-7B 7B
Falcon-40B 40B
Falcon-180B 180B

Paper

arXiv: 2311.16867

open-weightfrontier

Related