Powerful vision-language model using visual expert modules within pretrained language models.

Paper

arXiv: 2311.03079

Library

GitHub Repository

multimodalopen-weightvision

Related