Trinity Mini / Nano
modelSmaller Trinity variants sharing the same architecture. Mini: 26B/3B active (128 experts, top-8, 131K context). Nano: 6B/1B active (128 experts, 128K context). Both trained on 10T tokens using 512 H200 GPUs. Apache 2.0.
Model Details
Architecture MOE
Parameters 26B
Active params 3B
Context window 131,000
Variants
| Name | Parameters | Notes |
|---|---|---|
| Trinity Mini | 26B | — |
| Trinity Nano | 6B | — |
Paper
arXiv: 2602.17004