Segment Anything (SAM)
modelFoundation model for image segmentation. Trained on SA-1B, the largest segmentation dataset ever (1B+ masks on 11M images). Uses a promptable architecture: given points, boxes, or text, SAM generates high-quality segmentation masks in real-time.
SAM established the "foundation model" paradigm for vision tasks, analogous to what GPT achieved for language. Extended by SAM 2 (2024) for video with streaming memory. ICCV 2023. 42K+ GitHub stars. By Kirillov et al. Apache 2.0.
Paper
arXiv: 2304.02643
Venue: ICCV 2023