Hybrid Mamba2 + Sliding Window Attention architecture. PLaMo 2.0-31B (2T tokens), PLaMo 2-8B (6T tokens, ~45% English / ~30% Japanese / ~15% code), PLaMo 2-1B. Weight reusing and structural pruning: PLaMo 2.1-8B (pruned from 31B + 500B token retrain with knowledge distillation) matches PLaMo-100B quality at 7x less compute (55 vs 372 exaFLOPs).

32K context via continual pretraining with full attention + RoPE (theta=1M). PLaMo Community License (commercial up to 1B yen revenue).

Model Details

Architecture DENSE
Parameters 31B
Context window 32,000

Variants

Name Parameters Notes
PLaMo 2-1B 1B
PLaMo 2-8B 8B
PLaMo 2.0-31B 31B
PLaMo 2.1-8B 8B Pruned from 31B, matches PLaMo-100B

Paper

arXiv: 2509.04897

open-weightmultilingualarchitectureefficiency

Related