Open Efficient Language Models. 270M to 3B parameter dense Transformers with layer-wise scaling (varying width per layer for parameter efficiency). Fully open: training code, data, weights, and evaluation.

Apple's first open-weight language models. Trained on publicly available data. ICML 2024 workshop. Apache 2.0.

Model Details

Architecture DENSE
Parameters 3B

Variants

Name Parameters Notes
OpenELM 270M 270M
OpenELM 450M 450M
OpenELM 1.1B 1.1B
OpenELM 3B 3B

Paper

arXiv: 2404.14619

open-weightopen-source