Sarashina2
modelSB Intuitions' first frontier Japanese LLM series and the foundation of all later Sarashina releases. 7B, 13B, and 70B dense Llama-2-style Transformers with RoPE, SwiGLU, and a 102,400-token SentencePiece unigram vocabulary (no Japanese pre-tokenization). The 70B has 80 layers, 8192 hidden dim, 64 attention heads. All variants trained from scratch on 2.1T tokens: ~1T Japanese (Common Crawl cleaned with CCNet and HojiChar) plus English from SlimPajama-627B (books3 removed for copyright). Released August 2024 under the MIT license.
Sarashina2-70B is competitive with the top Japanese LLMs on the Swallow leaderboard and excels on Japan-specific QA such as the abc-multiple-choice and AI King quiz sets. Shown to be token-efficient for Japanese text relative to other LLMs. The series is base-only (no instruction tuning) and directly served as the base for Sarashina2-8x70B (sparse-upcycled MoE) as well as later instruction-tuned releases.
Model Details
Variants
| Name | Parameters | Notes |
|---|---|---|
| Sarashina2-7B | 7B | — |
| Sarashina2-13B | 13B | — |
| Sarashina2-70B | 70B | — |