TAPTR

Track Any Point TRansformer series converting TAP to point-level visual prompt detection built on DETR. TAPTRv2 introduces attention-based position update removing cost-volume dependency. TAPTRv3 adds visibility-aware long-temporal attention for robust tracking in long videos.

Paper (TAPTR, arXiv)Paper (TAPTRv2, arXiv)Paper (TAPTRv3, arXiv)GitHub Project Page

Outputs 4

library

Official implementation of TAPTR, TAPTRv2, and TAPTRv3 for tracking any point in videos using Transformer-based detection.

GitHub Project Page

GitHub Repository

TAPTR: Tracking Any Point with Transformers as Detection

paper

Converts tracking any point to point-level visual prompt detection built on DETR architecture.

Paper (arXiv)

arXiv: 2403.13042

Venue: ECCV 2024

TAPTRv2: Attention-based Position Update Improves Tracking Any Point

paper 2024-07-23

Introduces attention-based position update removing cost-volume computation while achieving state-of-the-art performance.

Paper (arXiv)

arXiv: 2407.16291

Venue: NeurIPS 2024

TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video

paper 2024-11-27

Proposes visibility-aware long-temporal attention and context-aware cross attention for robust point tracking in long videos.

Paper (arXiv)

arXiv: 2411.18671

Venue: ICLR 2026

visionopen-source