InteractiveOmni
model paperUnified open-source omni-modal large language model for audio-visual multi-turn interaction. Ranges from 4B to 8B parameters, integrating vision encoder, audio encoder, LLM, and speech decoder into a single model for comprehensive understanding and generation tasks. Leads the field of lightweight omni-modal models.
Outputs 2
InteractiveOmni Model
modelVariants
| Name | Parameters | Notes |
|---|---|---|
| InteractiveOmni-4B | 4B | — |
| InteractiveOmni-8B | 8B | — |
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
paperarXiv: 2510.13747