Sarashina2-8x70B | Lab Index

~465B-total-parameter Mixture-of-Experts Japanese LLM, built by sparse upcycling (Komatsuzaki et al., 2022) from Sarashina2-70B: 8 experts of 70B parameters each. First publicly available 400B-class Japanese LLM, framed as a contribution to Japanese academia and industry. Requires 16× H100 or A100-80GB GPUs for BF16 inference. Released under the Sarashina Model NonCommercial License.

Scaling decisions were informed by SB Intuitions' concurrent scaling-laws work on upcycling (Liew et al., ICML 2025), which identifies the regime in which converting a dense LLM into an MoE remains compute-efficient and where it saturates. Base model only — not instruction-tuned.

Press Release HuggingFace Upcycling Paper (arXiv)

Model Details

Architecture MOE

Base model sarashina2

open-weightmoemultilingualjapanesefrontier

Model Details

Related