"LLaMA: Open and Efficient Foundation Language Models." Dense Transformers from 7B to 65B parameters trained on publicly available data only. LLaMA-65B competitive with Chinchilla-70B and PaLM-540B; LLaMA-13B outperformed GPT-3 (175B) on most benchmarks.

LLaMA proved that smaller models trained on more data (following Chinchilla scaling laws) could match much larger models, catalyzing an explosion of open-source fine-tuning (Alpaca, Vicuna, etc.) and establishing Meta as the leader of the open-weight movement. By Touvron et al.

Model Details

Architecture DENSE
Parameters 65B

Variants

Name Parameters Notes
LLaMA 7B 7B
LLaMA 13B 13B
LLaMA 33B 33B
LLaMA 65B 65B

Paper

Citations 3,895
open-weightfoundational

Related