Home | Publications | MID+25a

MonoCT: Overcoming Monocular 3D Detection Domain Shift With Consistent Teacher Models

MCML Authors

Johannes Meier

→ Group Daniel Cremers
Computer Vision & Artificial Intelligence

Oussema Dhaouadi

→ Group Daniel Cremers
Computer Vision & Artificial Intelligence

Yan Xia

Dr.

* Former Member

→ Group Daniel Cremers
Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Director

Computer Vision & Artificial Intelligence

Abstract

We tackle the problem of monocular 3D object detection across different sensors, environments, and camera setups. In this paper, we introduce a novel unsupervised domain adaptation approach, MonoCT, that generates highly accurate pseudo labels for self-supervision. Inspired by our observation that accurate depth estimation is critical to mitigating domain shifts, MonoCT introduces a novel Generalized Depth Enhancement (GDE) module with an ensemble concept to improve depth estimation accuracy. Moreover, we introduce a novel Pseudo Label Scoring (PLS) module by exploring inner-model consistency measurement and a Diversity Maximization (DM) strategy to further generate high-quality pseudo labels for self-training. Extensive experiments on six benchmarks show that MonoCT outperforms existing SOTA domain adaptation methods by large margins (~21% minimum for AP Mod.) and generalizes well to car, traffic camera and drone views.

inproceedings MID+25a

IV 2025

36th IEEE Intelligent Vehicles Symposium. Napoca, Romania, Jun 22-25, 2025. To be published. Preprint available.

Authors

J. Meier • L. Inchingolo • O. Dhaouadi • Y. Xia • J. Kaiser • D. Cremers

Links

arXiv

Research Area

B1 | Computer Vision

BibTeXKey: MID+25a

#p-cremers