Apertus | Lab Index

Switzerland's sovereign multilingual LLM. 8B and 70B dense Transformers trained on 15 trillion tokens across 1,811 languages (~40% non-English) using 4,096 NVIDIA GH200 GPUs on the CSCS Alps supercomputer (10M+ GPU hours). 65K context. 101+ authors.

Novel contributions: xIELU activation function, AdEMAMix optimizer, Goldfish objective for suppressing verbatim memorization. Trained exclusively on openly available data with retroactive robots.txt compliance — claims first EU AI Act-compliant large model. Post-trained via SFT + QRPO alignment. Apache 2.0.

70B competitive with Llama 3.1-70B on average across multilingual benchmarks (67.5% vs 67.3%). AA Intelligence Index: 8 (70B Instruct), 6 (8B Instruct). Significantly behind frontier models on English-only benchmarks, but strongest fully open model for extreme multilingual breadth.

Paper (arXiv)Project Page HuggingFace (70B Instruct)HuggingFace (70B Base)HuggingFace (8B Instruct)HuggingFace (8B Base)AA (70B Instruct)AA (8B Instruct)

Model Details

Architecture DENSE

Parameters 70B

Context window 65,536

Training tokens 15T

AA Intelligence 8

License Apache-2.0

Variants

Name	Parameters	Notes
Apertus-70B-Instruct	70B	AA Intelligence Index 8
Apertus-70B	70B	Base model
Apertus-8B-Instruct	8B	AA Intelligence Index 6
Apertus-8B	8B	Base model

Paper

arXiv HTML

open-sourceopen-weightmultilingual