Poster
in
Workshop: Machine Learning for Autonomous Driving
Monocular 3D Object Detection by Leveraging Self-Supervised Visual Pre-training
Can Erhan · Anıl Öztürk · Burak Gunel · Nazim Kemal Ure
Precise detection of 3D objects is a critical task in autonomous driving. Monocular 3D object detection problem is defined as predicting 3D bounding boxes in the metric space with a single monocular image. Most 3D detectors follow the standard pre-training strategy using the supervised ImageNet dataset, which is created for a dissimilar classification task. In this paper, a simple and effective pre-training strategy is proposed for monocular 3D object detection problem, without requiring any human supervision and annotated data. A dense depth estimation pretext task is incorporated into the pre-training pipeline by taking advantage of self-supervised learning. Experiments show that transferring the pre-trained weights to the detection network increases the performance in 3D object detection and bird's eye view evaluations up to 25% improvement rate with respect to the baseline networks that are based on ImageNet pre-training. This strategy has the potential of being applicable to other 3D object detection methods without any modifications to the existing algorithm design.