Bilingual (Chinese/English) models for multimodal representation and generation.
multimodalnlpopen-weight