Open-source initiative to unite the global community in developing next-generation language models, inspired by DeepSeek. Uses DeepSeek V3 MoE architecture (64 experts, top-6 routing). OpenSeek-Small v1 is the first-stage model with 1.4B total / 0.4B active parameters trained on 720B tokens. Addresses three challenges: high-quality data acquisition, algorithmic innovation, and distributed training systems. Uses FlagScale for distributed training.

Model Details

Variants

Name Parameters Notes
OpenSeek-Small-v1-Baseline 1.4B total / 0.4B active, trained on 100B tokens
OpenSeek-Small-v1 1.4B total / 0.4B active, trained on 720B tokens
nlpopen-weightmoe

Related