Next-generation vision-language model. CogVLM2-Video introduced multi-frame time localization for precise video understanding.

Paper

arXiv: 2408.16500

multimodalopen-weightvideo

Related