First general-purpose 67B model, outperforming Llama 2, with a technical report on scaling open-source language models with a long-term vision.

Outputs 2

DeepSeek-LLM

model

First general-purpose 67B model, outperforming Llama 2 at the time.

Architecture DENSE
Parameters 67B

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

paper

Technical report on scaling open-source language models with a long-term vision.

arXiv: 2401.02954

open-weightscaling