Specialized model for few-shot audio understanding and environmental sound classification. Pre-trained on 100M+ hours, achieves SOTA on speech intelligence and audio understanding benchmarks.

Outputs 2

MiMo-Audio-7B

model
Architecture DENSE
Parameters 7B

Date approximate.

MiMo-Audio: Audio Language Models are Few-Shot Learners

paper

Audio LM pre-trained on 100M+ hours. Achieves SOTA on speech intelligence and audio understanding benchmarks.

arXiv: 2512.23808

audioopen-weight