Text-to-video generation model using a diffusion transformer operating on spacetime patches. Generates up to 1080p video, up to 20 seconds long, from text descriptions or images. Treats videos as collections of 3D spacetime patches, enabling variable duration, resolution, and aspect ratio.

Sora demonstrated that the scaling paradigm of transformers extends to video generation, producing coherent scenes with complex motion, multiple characters, and consistent 3D geometry. Previewed February 2024, launched publicly as Sora Turbo in December 2024. Succeeded by Sora 2 (September 2025) with synchronized dialogue. Proprietary.

visionmultimodalvideo