Step-Audio-R1
modelFirst audio LLM to unlock test-time compute scaling via Chain-of-Thought reasoning. 33B parameters. Surpasses Gemini 2.5 Pro on audio understanding benchmarks.
Model Details
Architecture DENSE
Parameters 33B
Variants
| Name | Parameters | Notes |
|---|---|---|
| Step-Audio-R1 | — | Released Nov 27, 2025 |
| Step-Audio-R1.1 | — | Released Jan 14, 2026. Dual-Brain Architecture for real-time spoken dialogue. |
| Step-Audio-R1.1 (Realtime) | — | Top-ranked speech-to-speech model on AA's Big Bench Audio (96.4%, May 2026), ahead of xAI's Grok Voice Agent. |
Paper
Notes
arXiv submission Nov 19, 2025. Model weights released Nov 27, 2025.