Multimodal foundation model enabling LLMs to "see" and "draw" via a discrete visual tokenizer. SEED-LLaMA pioneered multimodal in-context learning with interleaved images and text. Published at ICLR 2024.

Paper

Venue ICLR 2024
Citations 10
multimodalgenerationvisionresearch