Track Any Point TRansformer series converting TAP to point-level visual prompt detection built on DETR. TAPTRv2 introduces attention-based position update removing cost-volume dependency. TAPTRv3 adds visibility-aware long-temporal attention for robust tracking in long videos.

Outputs 4

TAPTR

library

Official implementation of TAPTR, TAPTRv2, and TAPTRv3 for tracking any point in videos using Transformer-based detection.

GitHub Repository

TAPTR: Tracking Any Point with Transformers as Detection

paper

Converts tracking any point to point-level visual prompt detection built on DETR architecture.

arXiv: 2403.13042

Venue: ECCV 2024

TAPTRv2: Attention-based Position Update Improves Tracking Any Point

paper

Introduces attention-based position update removing cost-volume computation while achieving state-of-the-art performance.

arXiv: 2407.16291

Venue: NeurIPS 2024

TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video

paper

Proposes visibility-aware long-temporal attention and context-aware cross attention for robust point tracking in long videos.

arXiv: 2411.18671

Venue: ICLR 2026

visionopen-source