SB Intuitions' first frontier Japanese LLM series and the foundation of all later Sarashina releases. 7B, 13B, and 70B dense Llama-2-style Transformers with RoPE, SwiGLU, and a 102,400-token SentencePiece unigram vocabulary (no Japanese pre-tokenization). The 70B has 80 layers, 8192 hidden dim, 64 attention heads. All variants trained from scratch on 2.1T tokens: ~1T Japanese (Common Crawl cleaned with CCNet and HojiChar) plus English from SlimPajama-627B (books3 removed for copyright). Released August 2024 under the MIT license.

Sarashina2-70B is competitive with the top Japanese LLMs on the Swallow leaderboard and excels on Japan-specific QA such as the abc-multiple-choice and AI King quiz sets. Shown to be token-efficient for Japanese text relative to other LLMs. The series is base-only (no instruction tuning) and directly served as the base for Sarashina2-8x70B (sparse-upcycled MoE) as well as later instruction-tuned releases.

Model Details

Architecture DENSE
Parameters 70B
Context window 4,096
Training tokens 2.1T

Variants

Name Parameters Notes
Sarashina2-7B 7B
Sarashina2-13B 13B
Sarashina2-70B 70B
open-weightmultilingualjapanese

Related