Solar Open 100B
model102.6B-parameter bilingual (English/Korean) MoE model with 12B active parameters per token. Architecture: 48 layers, 129 experts (128 routed + 1 shared, top-8 routing), 4096 hidden dim, 64 attention heads, 8 KV heads (GQA), 196,608 vocab. Trained on 480 NVIDIA B200 GPUs (60 nodes) using TorchTitan with hybrid FP8/bfloat16 precision.
Pre-trained on 19.7T tokens (13T English, 3.7T math/code, 1.1T Korean, 0.8T Japanese, 0.8T multilingual, 0.4T domain-specific) via a phased curriculum that progressively increased synthetic data from 10% to 64%. 4.5T synthetic tokens generated by Solar Pro 2 and open-source models, including entirely synthetic Korean STEM data. Mid-training added 1,150B tokens of reasoning (850B), long context (135B), and quality data (170B).
Introduces SnapPO (Snapshot Sampling for Policy Optimization), an off-policy RL framework that decouples generation, reward computation, and training into independent stages for linear scalability. Uses domain-specific rewards: verifiable correctness for STEM, execution-based for code, composite rewards for agent simulation.
MMLU: 88.2, AIME 2024: 91.7, AIME 2025: 84.3, LiveCodeBench: 74.2, KMMLU: 73.0. Matched GLM-4.5-Base performance at 48% of its English training budget and 77% of its Korean budget. Korean domain expertise leads: +8.6pp medical, +3.0pp finance, +2.7pp law vs comparable models. Apache 2.0.
Model Details
Paper
arXiv: 2601.07022