Most advanced agentic model. 196B MoE (11B active) using MTP-3 and hybrid attention. Optimized for speed (350 tok/s) with 256k context. Step-3.5-Flash leverages **StepCrawl**, a proprietary high-signal data acquisition system that prioritizes information-dense documents (especially PDFs) through a sophisticated URL selection layer, moving beyond standard web-scale crawls. Released with a 1.6M-row instruction-tuning dataset.

Outputs 3

Step-3.5-Flash

model

Most advanced agentic model. 196B MoE (11B active) using MTP-3 and hybrid attention. Optimized for speed (350 tok/s) with 256k context.

Architecture MOE
Parameters 196B
Active params 11B
Context window 256,000

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

paper

Detailed the RL pipeline and MTP-3 acceleration for the Step 3.5 Flash model.

arXiv: 2602.10604

Step-3.5-Flash-SFT

dataset

Massive 1.6M-row instruction-tuning dataset released to the community.

moeagenticefficiencyopen-weightreasoningtraining-datatraining