"Fast-thinking" model using a Hybrid-Mamba-Transformer architecture for near-instant replies with complex reasoning. 56B activated / 560B total hybrid MoE. 256K context, 16T pre-training tokens.

Outputs 2

Hunyuan Turbo S

model
Architecture MOE
Parameters 560B
Active params 56B

Hunyuan-TurboS: Mamba-Transformer Synergy

paper

56B activated / 560B total hybrid MoE with Mamba-Transformer architecture. 256K context, 16T pre-training tokens.

arXiv: 2505.15431

reasoningefficiencymambamoe