The first large-scale generative model for text-to-image synthesis in the general domain for Chinese. CogView is a 4-billion-parameter Transformer using a VQ-VAE tokenizer. At release, it achieved state-of-the-art performance on MS COCO, outperforming OpenAI's DALL-E. It laid the foundation for the "Cog" series of models and demonstrated the viability of large-scale autoregressive image generation.

Paper

arXiv: 2105.13290

generationvisionresearch

Related