LLM-jp-4
modelLatest generation with MoE and "thinking" variants. 32B/3.8B active MoE (128 routed experts, top-8, 32 layers, 2560 hidden, 40 heads). 65K context. Trained on 11.7T tokens with llm-jp-tokenizer v4.0. Apache 2.0.
Also includes dense 8B (9B params) with base and thinking variants. MT-Bench JA: 7.57-7.82, MT-Bench EN: 7.70-7.86. Evaluated using GPT-5.4 as judge.
Model Details
Architecture MOE
Parameters 32B
Active params 3.8B
Context window 65,536
Variants
| Name | Parameters | Notes |
|---|---|---|
| llm-jp-4-8B | 8B | — |
| llm-jp-4-32B-A3B | 32B | — |