LongCat-Video

13.6B parameter foundational video generation model unifying text-to-video, image-to-video, and video-continuation tasks. Uses Diffusion Transformer with block sparse attention and multi-reward GRPO. Includes a Video-Avatar variant for identity-consistent portrait generation.

Paper (arXiv)HuggingFace GitHub

No results found