Multimodal-driven architecture for customized video generation. Enables identity-preserving, style-consistent, and subject-driven video creation from reference images and text prompts.

Paper

arXiv: 2505.04512

videogenerationcustomization