Segment Anything (SAM)

Foundation model for image segmentation. Trained on SA-1B, the largest segmentation dataset ever (1B+ masks on 11M images). Uses a promptable architecture: given points, boxes, or text, SAM generates high-quality segmentation masks in real-time.

SAM established the "foundation model" paradigm for vision tasks, analogous to what GPT achieved for language. Extended by SAM 2 (2024) for video with streaming memory. ICCV 2023. 42K+ GitHub stars. By Kirillov et al. Apache 2.0.

Paper (arXiv)GitHub

Paper

arXiv: 2304.02643

Venue: ICCV 2023

visionfoundationalopen-source