Switzerland's sovereign multilingual LLM. 8B and 70B dense Transformers trained on 15 trillion tokens across 1,811 languages (~40% non-English) using 4,096 NVIDIA GH200 GPUs on the CSCS Alps supercomputer (10M+ GPU hours). 65K context. 101+ authors.

Novel contributions: xIELU activation function, AdEMAMix optimizer, Goldfish objective for suppressing verbatim memorization. Trained exclusively on openly available data with retroactive robots.txt compliance — claims first EU AI Act-compliant large model. Post-trained via SFT + QRPO alignment. Apache 2.0.

70B competitive with Llama 3.1-70B on average across multilingual benchmarks (67.5% vs 67.3%). AA Intelligence Index: 8 (70B Instruct), 6 (8B Instruct). Significantly behind frontier models on English-only benchmarks, but strongest fully open model for extreme multilingual breadth.

Model Details

Architecture DENSE
Parameters 70B
Context window 65,536
Training tokens 15T
AA Intelligence 8
License Apache-2.0

Variants

Name Parameters Notes
Apertus-70B-Instruct 70B AA Intelligence Index 8
Apertus-70B 70B Base model
Apertus-8B-Instruct 8B AA Intelligence Index 6
Apertus-8B 8B Base model

Paper

open-sourceopen-weightmultilingual