Unified Transformer-based framework for object detection and segmentation. Extends DINO with a mask prediction branch supporting instance, panoptic, and semantic segmentation. Achieves 54.7 AP on COCO instance, 59.5 PQ on COCO panoptic, and 60.8 mIoU on ADE20K semantic segmentation.

Outputs 2

Mask DINO

library

Official implementation of Mask DINO. Achieves best results on all three segmentation tasks simultaneously.

GitHub Repository

Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation

paper

Extends DINO with a shared mask prediction branch that supports all image segmentation tasks via query-based dot-product with pixel embeddings.

arXiv: 2206.02777

Venue: CVPR 2023

visionopen-source