Next-generation vision-language model. CogVLM2-Video introduced multi-frame time localization for precise video understanding.

Paper

multimodalopen-weightvideo

Related