Large language models as effective temporal learners for video understanding. Published at ECCV 2024.

Paper

arXiv: 2404.00308

Venue: ECCV 2024

videomultimodalresearch