MiMo-Audio | Lab Index

Specialized model for few-shot audio understanding and environmental sound classification. Pre-trained on 100M+ hours, achieves SOTA on speech intelligence and audio understanding benchmarks.

HuggingFace Paper (arXiv)

Outputs 2

MiMo-Audio-7B

model

HuggingFace

Architecture DENSE

Parameters 7B

Date approximate.

MiMo-Audio: Audio Language Models are Few-Shot Learners

paper 2025-12-29

Audio LM pre-trained on 100M+ hours. Achieves SOTA on speech intelligence and audio understanding benchmarks.

Paper (arXiv)

arXiv HTML

audioopen-weight