Vision-language model with dynamic resolution processing and absolute time encoding.

Paper

arXiv: 2502.13923

multimodalopen-weightvision

Related