JAIS 2
modelSecond generation JAIS. 8B and 70B trained from scratch (not adapted) on 2.6T tokens of Arabic, English, and code using Cerebras CS-2 on Condor Galaxy clusters. Custom Arabic-centric vocabulary (150K tokens), RoPE, Squared-ReLU, muP. Post-trained via SFT (20M+ instruction pairs) → DPO → GRPO.
AraGen-12-24 overall: 70.71% (outperforms Qwen2.5-72B and Llama-3.3-70B on Arabic generative tasks). ~2,000 tokens/sec on Cerebras hardware.
Model Details
Architecture DENSE
Parameters 70B
Context window 8,192
Variants
| Name | Parameters | Notes |
|---|---|---|
| JAIS-2-8B | 8B | — |
| JAIS-2-70B | 70B | — |