13.6B parameter foundational video generation model unifying text-to-video, image-to-video, and video-continuation tasks. Uses Diffusion Transformer with block sparse attention and multi-reward GRPO. Includes a Video-Avatar variant for identity-consistent portrait generation.

Outputs 3

LongCat-Video

model
Architecture DENSE
Parameters 13.6B

LongCat-Video Technical Report

paper

arXiv: 2510.22200

LongCat-Video-Avatar

model

Identity-consistent portrait generation variant.

videogenerationopen-weight

Related