Home | Publications | WXW+24

Boosting 3D Single Object Tracking With 2D Matching Distillation and 3D Pre-Training

MCML Authors

Yan Xia

Dr.

* Former Member

→ Group Daniel Cremers
Computer Vision & Artificial Intelligence

Abstract

3D single object tracking (SOT) is an essential task in autonomous driving and robotics. However, learning robust 3D SOT trackers remains challenging due to the limited category-specific point cloud data and the inherent sparsity and incompleteness of LiDAR scans. To tackle these issues, we propose a unified 3D SOT framework that leverages 3D generative pre-training and learns robust 3D matching abilities from 2D pre-trained foundation trackers. Our framework features a consistent target-matching architecture with the widely used 2D trackers, facilitating the transfer of 2D matching knowledge. Specifically, we first propose a lightweight Target-Aware Projection (TAP) module, allowing the pre-trained 2D tracker to work well on the projected point clouds without further fine-tuning. We then propose a novel IoU-guided matching-distillation framework that utilizes the powerful 2D pre-trained trackers to guide 3D matching learning in the 3D tracker, i.e., the 3D template-to-search matching should be consistent with its corresponding 2D template-to-search matching obtained from 2D pre-trained trackers. Our designs are applied to two mainstream 3D SOT frameworks: memory-less Siamese and contextual memory-based approaches, which are respectively named SiamDisst and MemDisst. Extensive experiments show that SiamDisst and MemDisst achieve state-of-the-art performance on KITTI, Waymo Open Dataset and nuScenes benchmarks, while running at above real-time speed of 25 and 90 FPS on a RTX3090 GPU.

inproceedings WXW+24

ECCV 2024

18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024.

Authors

Q. Wu • Y. Xia • J. Wan • A. B. Chan

Links

DOI

Research Area

B1 | Computer Vision

BibTeXKey: WXW+24

#p-cremers