Open-set object detector that marries DINO with grounded pre-training. Detects arbitrary objects using text prompts (category names or referring expressions). Achieves 52.5 zero-shot AP on COCO without any COCO training data and 63.0 AP after fine-tuning. One of the most widely adopted open-set detectors in the community.

Outputs 2

Grounding DINO

model

Open-set object detection model with text-guided detection. Base and large model variants available.

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

paper

Presents the Grounding DINO architecture combining Transformer-based detection with grounded pre-training for open-set object detection.

arXiv: 2303.05499

Venue: ECCV 2024

visionopen-vocabularyopen-source