Home  | Publications | WXW+24

Boosting 3D Single Object Tracking With 2D Matching Distillation and 3D Pre-Training

MCML Authors

Abstract

3D single object tracking (SOT) is an essential task in autonomous driving and robotics. However, learning robust 3D SOT trackers remains challenging due to the limited category-specific point cloud data and the inherent sparsity and incompleteness of LiDAR scans. To tackle these issues, we propose a unified 3D SOT framework that leverages 3D generative pre-training and learns robust 3D matching abilities from 2D pre-trained foundation trackers. Our framework features a consistent target-matching architecture with the widely used 2D trackers, facilitating the transfer of 2D matching knowledge. Specifically, we first propose a lightweight Target-Aware Projection (TAP) module, allowing the pre-trained 2D tracker to work well on the projected point clouds without further fine-tuning. We then propose a novel IoU-guided matching-distillation framework that utilizes the powerful 2D pre-trained trackers to guide 3D matching learning in the 3D tracker, i.e., the 3D template-to-search matching should be consistent with its corresponding 2D template-to-search matching obtained from 2D pre-trained trackers. Our designs are applied to two mainstream 3D SOT frameworks: memory-less Siamese and contextual memory-based approaches, which are respectively named SiamDisst and MemDisst. Extensive experiments show that SiamDisst and MemDisst achieve state-of-the-art performance on KITTI, Waymo Open Dataset and nuScenes benchmarks, while running at above real-time speed of 25 and 90 FPS on a RTX3090 GPU.

inproceedings


ECCV 2024

18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024.
Conference logo
A* Conference

Authors

Q. Wu • Y. Xia • J. Wan • A. B. Chan

Links

DOI

Research Area

 B1 | Computer Vision

BibTeXKey: WXW+24

Back to Top