dots.llm1
model paperXiaohongshu hi lab's first open-weight LLM: a 142B-parameter MoE activating 14B per token, pretrained from scratch on 11.2T high-quality tokens with no synthetic data. Achieves performance comparable to Qwen2.5-72B, with strong Chinese-language results the company claimed surpassed Qwen2.5-72B-Instruct and DeepSeek-V3 on Chinese comprehension. MIT-licensed, with unusual openness: intermediate checkpoints released every 1T training tokens, enabling training-dynamics research on a production-scale run.
Outputs 2
dots.llm1
model Architecture MOE
Parameters 142B
Active params 14B
Context window 32,768
Training tokens 11.2T
License MIT
dots.llm1 Technical Report
paperDetails the MoE architecture, the 11.2T-token non-synthetic data pipeline, and the efficient training infrastructure.