Powerful vision-language model using visual expert modules within pretrained language models.

Paper

Citations 81

Library

multimodalopen-weightvision

Related