Llama 2
model7B, 13B, and 70B parameter dense Transformers trained on 2T tokens. 70B uses Grouped Query Attention (GQA). 4K token context. First Llama with commercial licensing. Chat variants fine-tuned with RLHF.
Llama 2 was the first open-weight model with a permissive commercial license, enabling enterprise adoption at scale. The 70B chat model was competitive with early GPT-3.5. AA Intelligence Index: 8 (70B). By Touvron et al.
Model Details
Architecture DENSE
Parameters 70B
Context window 4,096
Variants
| Name | Parameters | Notes |
|---|---|---|
| Llama 2 7B | 7B | — |
| Llama 2 13B | 13B | — |
| Llama 2 70B | 70B | — |
Paper
arXiv: 2307.09288