Multimodal model specialized in localized visual grounding and reasoning.

Library

GitHub Repository

multimodalvision

Notes

Published at ECCV 2024.