Switzerland's sovereign multilingual LLM. 8B and 70B dense Transformers trained on 15 trillion tokens across 1,811 languages (~40% non-English) using 4,096 NVIDIA GH200 GPUs on the CSCS Alps supercomputer (10M+ GPU hours). 65K context. 101+ authors.

Novel contributions: xIELU activation function, AdEMAMix optimizer, Goldfish objective for suppressing verbatim memorization. Trained exclusively on openly available data with retroactive robots.txt compliance — claims first EU AI Act-compliant large model. Post-trained via SFT + QRPO alignment. Apache 2.0.

70B competitive with Llama 3.1-70B on average across multilingual benchmarks (67.5% vs 67.3%). AA Intelligence: 6 (8B). Significantly behind frontier models on English-only benchmarks, but strongest fully open model for extreme multilingual breadth.

Model Details

Architecture DENSE
Parameters 70B
Context window 65,536

Variants

Name Parameters Notes
Apertus-8B 8B
Apertus-70B 70B

Paper

arXiv: 2509.14233

open-sourceopen-weightmultilingual