Frontier 671B MoE model with Multi-Token Prediction and FP8 mixed-precision training. V3-0324 update released 2025-03-24. Accompanied by a technical report and a paper on scaling challenges.

Outputs 3

DeepSeek-V3

model
Architecture MOE
Parameters 671B
Active params 37B

Variants

Name Parameters Notes
DeepSeek-V3
DeepSeek-V3-0324 Released 2025-03-24

DeepSeek-V3 Technical Report

paper

Technical report for the landmark 671B MoE model with Multi-Token Prediction and FP8 mixed-precision training.

arXiv: 2412.19437

Insights into DeepSeek-V3: Scaling Challenges

paper

Paper detailing the scaling challenges encountered during DeepSeek-V3 development, including hardware architecture insights.

arXiv: 2505.09343

moeopen-weightfrontierscaling