Marries Grounding DINO with Segment Anything for automated open-world detection, segmentation, and generation. Achieves 48.7 mean AP on SegInW zero-shot benchmark. Grounded-SAM-2 extends support to video with SAM 2 and Florence-2 integration. One of the most popular open-source vision pipelines with 15k+ GitHub stars.

Outputs 3

Grounded-Segment-Anything

library

Combines Grounding DINO with SAM, Stable Diffusion, and Recognize Anything for automated detect-segment-generate pipelines.

GitHub Repository

Grounded-SAM-2

library

Extends Grounded SAM to video with SAM 2 and Florence-2 for grounding and tracking anything in videos.

GitHub Repository

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

paper

Technical report on the Grounded SAM pipeline for assembling open-world vision models for diverse visual tasks.

arXiv: 2401.14159

visionopen-vocabularyopen-source