NOVA
modelNon-quantized autoregressive video generation model (ICLR 2025). Reformulates video generation as non-quantized autoregressive frame-by-frame temporal prediction and spatial set-by-set prediction. Achieves VBench score of 80.1 with 2.75 FPS processing speed, trained in only 342 GPU days on A100-40G. Surpasses prior autoregressive video models in data efficiency, inference speed, visual fidelity, and video fluency with only 0.6B parameters. Also outperforms SOTA image diffusion models in text-to-image generation.
Paper
arXiv: 2412.14169
Venue: ICLR 2025