Timezone: »
Recent work has shown that CNN-based depth and ego-motion estimators can be learned using unlabelled monocular videos. However, the performance is limited by unidentified moving objects that violate the underlying static scene assumption in geometric image reconstruction. More significantly, due to lack of proper constraints, networks output scale-inconsistent results over different samples, i.e., the ego-motion network cannot provide full camera trajectories over a long video sequence because of the per-frame scale ambiguity. This paper tackles these challenges by proposing a geometry consistency loss for scale-consistent predictions and an induced self-discovered mask for handling moving objects and occlusions. Since we do not leverage multi-task learning like recent works, our framework is much simpler and more efficient. Comprehensive evaluation results demonstrate that our depth estimator achieves the state-of-the-art performance on the KITTI dataset. Moreover, we show that our ego-motion network is able to predict a globally scale-consistent camera trajectory for long video sequences, and the resulting visual odometry accuracy is competitive with the recent model that is trained using stereo videos. To the best of our knowledge, this is the first work to show that deep networks trained using unlabelled monocular videos can predict globally scale-consistent camera trajectories over a long video sequence.
Author Information
Jiawang Bian (The University of Adelaide)
Zhichao Li (Tusimple)
Naiyan Wang (Hong Kong University of Science and Technology)
Huangying Zhan (The University of Adelaide)
Chunhua Shen (University of Adelaide)
Ming-Ming Cheng (Nankai University)
Ian Reid (University of Adelaide)
More from the Same Authors
-
2022 Poster: Fully Sparse 3D Object Detection »
Lue Fan · Feng Wang · Naiyan Wang · ZHAO-XIANG ZHANG -
2022 Poster: SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation »
Meng-Hao Guo · Cheng-Ze Lu · Qibin Hou · Zhengning Liu · Ming-Ming Cheng · Shi-min Hu -
2022 Spotlight: SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation »
Meng-Hao Guo · Cheng-Ze Lu · Qibin Hou · Zhengning Liu · Ming-Ming Cheng · Shi-min Hu -
2021 Poster: Twins: Revisiting the Design of Spatial Attention in Vision Transformers »
Xiangxiang Chu · Zhi Tian · Yuqing Wang · Bo Zhang · Haibing Ren · Xiaolin Wei · Huaxia Xia · Chunhua Shen -
2021 Poster: Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation »
Bowen Zhang · Yifan liu · Zhi Tian · Chunhua Shen -
2020 Poster: ICNet: Intra-saliency Correlation Network for Co-Saliency Detection »
Wen-Da Jin · Jun Xu · Ming-Ming Cheng · Yi Zhang · Wei Guo -
2020 Poster: SOLOv2: Dynamic and Fast Instance Segmentation »
Xinlong Wang · Rufeng Zhang · Tao Kong · Lei Li · Chunhua Shen -
2019 Poster: Social-BiGAT: Multimodal Trajectory Forecasting using Bicycle-GAN and Graph Attention Networks »
Vineet Kosaraju · Amir Sadeghian · Roberto Martín-Martín · Ian Reid · Hamid Rezatofighi · Silvio Savarese -
2019 Poster: Multi-marginal Wasserstein GAN »
Jiezhang Cao · Langyuan Mo · Yifan Zhang · Kui Jia · Chunhua Shen · Mingkui Tan -
2018 Poster: Self-Erasing Network for Integral Object Attention »
Qibin Hou · PengTao Jiang · Yunchao Wei · Ming-Ming Cheng -
2017 Poster: Deep Subspace Clustering Networks »
Pan Ji · Tong Zhang · Hongdong Li · Mathieu Salzmann · Ian Reid -
2017 Poster: A Bayesian Data Augmentation Approach for Learning Deep Models »
Toan Tran · Trung Pham · Gustavo Carneiro · Lyle Palmer · Ian Reid -
2016 Poster: Image Restoration Using Very Deep Convolutional Encoder-Decoder Networks with Symmetric Skip Connections »
Xiaojiao Mao · Chunhua Shen · Yu-Bin Yang -
2015 Poster: Deeply Learning the Messages in Message Passing Inference »
Guosheng Lin · Chunhua Shen · Ian Reid · Anton van den Hengel -
2013 Poster: Learning a Deep Compact Image Representation for Visual Tracking »
Naiyan Wang · Dit-Yan Yeung