Grounding DINO

Open-set object detector that marries DINO with grounded pre-training. Detects arbitrary objects using text prompts (category names or referring expressions). Achieves 52.5 zero-shot AP on COCO without any COCO training data and 63.0 AP after fine-tuning. One of the most widely adopted open-set detectors in the community.

Paper (arXiv)GitHub HuggingFace

Outputs 2

model

Open-set object detection model with text-guided detection. Base and large model variants available.

GitHub HuggingFace

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

paper

Presents the Grounding DINO architecture combining Transformer-based detection with grounded pre-training for open-set object detection.

Paper (arXiv)

Venue ECCV 2024

Citations 245

arXiv HTML

visionopen-vocabularyopen-source