Grok-1
model314B parameter MoE (8 experts, 2 active per token, ~78B active). 64 Transformer layers, 48 query attention heads, 8 KV heads (Grouped Query Attention). 8,192 token context. Open-sourced under Apache 2.0 on March 17, 2024.
First Grok model and one of the largest open-weight models at the time of release. GitHub repo reached 51.5K stars. Originally released November 2023 on the X platform, open-sourced four months later.
Model Details
Architecture MOE
Parameters 314B
Active params 78B
Context window 8,192