Next-generation vision-language model. CogVLM2-Video introduced multi-frame time localization for precise video understanding.

Paper

Citations 6
multimodalopen-weightvideo

Related