Native omni-modal model supporting streaming audio-visual interaction. 560B MoE (27B active), 128K context, millisecond-level end-to-end latency, 8+ minutes of real-time audio-visual interaction. Benchmarks: 61.4 OmniBench, 78.2 VideoMME, 88.7 VoiceBench.

Outputs 2

LongCat-Flash-Omni

model
Architecture MOE
Parameters 560B
Active params 27B

LongCat-Flash-Omni Technical Report

paper

arXiv: 2511.00279

moemultimodalaudioopen-weight

Related