Scientific foundation model for electroencephalography (EEG): a 380M-parameter masked diffusion autoencoder trained on a harmonized corpus of 208 public datasets spanning ~2 million channel-hours. Tokenizes multichannel EEG into short temporal windows and uses 4D rotary positional encoding over (x,y,z,t) to inject spatiotemporal structure, allowing inference on arbitrary channel subsets and electrode positions.

Performs masked channel infilling and EEG superresolution, with explicit cross-dataset / cross-electrode-layout generalization — the paper shows it can be applied directly to novel datasets and problems without retraining, outperforming spherical-spline interpolation and prior deep-learning methods. Open weights. A clear demonstration that Zyphra's hybrid-architecture toolkit extends beyond language into scientific domains.

Model Details

Parameters 380M
Training tokens ~2M channel-hours (208 datasets)

Paper

sciencefoundationalopen-weight