dots.llm1

Xiaohongshu hi lab's first open-weight LLM: a 142B-parameter MoE activating 14B per token, pretrained from scratch on 11.2T high-quality tokens with no synthetic data. Achieves performance comparable to Qwen2.5-72B, with strong Chinese-language results the company claimed surpassed Qwen2.5-72B-Instruct and DeepSeek-V3 on Chinese comprehension. MIT-licensed, with unusual openness: intermediate checkpoints released every 1T training tokens, enabling training-dynamics research on a production-scale run.

HuggingFace GitHub Paper (arXiv)

No results found