Multimodal foundation model enabling LLMs to "see" and "draw" via a discrete visual tokenizer. SEED-LLaMA pioneered multimodal in-context learning with interleaved images and text. Published at ICLR 2024.

Paper

arXiv: 2310.01218

Venue: ICLR 2024

multimodalgenerationvisionresearch