Multimodal audio models supporting 16.7Hz semantic tokenization and real-time "omni" interaction (speech-to-speech).

Model Details

Variants

Name Parameters Notes
Step-Audio
Step-Audio2
audiomultimodal

Notes

Step-Audio released Feb 17, 2025. Step-Audio 2 released Jul 23, 2025.