Timezone: »
Training saliency detection models with weak supervisions, e.g., image-level tags or captions, is appealing as it removes the costly demand of per-pixel annotations. Despite the rapid progress of RGB-D saliency detection in fully-supervised setting, it however remains an unexplored territory when only weak supervision signals are available. This paper is set to tackle the problem of weakly-supervised RGB-D salient object detection. The key insight in this effort is the idea of maintaining per-pixel pseudo-labels with iterative refinements by reconciling the multimodal input signals in our joint semantic mining (JSM). Considering the large variations in the raw depth map and the lack of explicit pixel-level supervisions, we propose spatial semantic modeling (SSM) to capture saliency-specific depth cues from the raw depth and produce depth-refined pseudo-labels. Moreover, tags and captions are incorporated via a fill-in-the-blank training in our textual semantic modeling (TSM) to estimate the confidences of competing pseudo-labels. At test time, our model involves only a light-weight sub-network of the training pipeline, i.e., it requires only an RGB image as input, thus allowing efficient inference. Extensive evaluations demonstrate the effectiveness of our approach under the weakly-supervised setting. Importantly, our method could also be adapted to work in both fully-supervised and unsupervised paradigms. In each of these scenarios, superior performance has been attained by our approach with comparing to the state-of-the-art dedicated methods. As a by-product, a CapS dataset is constructed by augmenting existing benchmark training set with additional image tags and captions.
Author Information
Jingjing Li (University of Alberta)
Wei Ji (University of Alberta)
Qi Bi (University of Amsterdam)
Cheng Yan (Beihang University)
Miao Zhang (Dalian University of Technology)
Yongri Piao (Dalian University of Technology)
Huchuan Lu (Dalian University of Technology)
Li cheng (University of Alberta)
More from the Same Authors
-
2023 Poster: Decorate3D: Text-Driven High-Quality Texture Generation for Mesh Decoration in the Wild »
Yanhui Guo · Xinxin Zuo · Peng Dai · Juwei Lu · Xiaolin Wu · Li cheng · Youliang Yan · Songcen Xu · Xiaofei Wu -
2023 Poster: Saliency Revisited using RGBD Videos: A Unified Dataset and Benchmark »
Jingjing Li · Wei Ji · Size Wang · Wenbo Li · Li cheng -
2022 Spotlight: Lightning Talks 3B-4 »
Guanghu Yuan · Yijing Liu · Li Yang · Yongri Piao · Zekang Zhang · Yaxin Xiao · Lin Chen · Hong Chang · Fajie Yuan · Guangyu Gao · Hong Chang · Qinxian Liu · Zhixiang Wei · Qingqing Ye · Chenyang Lu · Jian Meng · Haibo Hu · Xin Jin · Yudong Li · Miao Zhang · Zhiyuan Fang · Jae-sun Seo · Bingpeng MA · Jian-Wei Zhang · Shiguang Shan · Haozhe Feng · Huaian Chen · Deliang Fan · Huadi Zheng · Jianbo Jiao · Huchuan Lu · Beibei Kong · Miao Zheng · Chengfang Fang · Shujie Li · Zhongwei Wang · Yunchao Wei · Xilin Chen · Jie Shi · Kai Chen · Zihan Zhou · Lei Chen · Yi Jin · Wei Chen · Min Yang · Chenyun YU · Bo Hu · Zang Li · Yu Xu · Xiaohu Qie -
2022 Spotlight: Semi-Supervised Video Salient Object Detection Based on Uncertainty-Guided Pseudo Labels »
Yongri Piao · Chenyang Lu · Miao Zhang · Huchuan Lu -
2022 Poster: Semi-Supervised Video Salient Object Detection Based on Uncertainty-Guided Pseudo Labels »
Yongri Piao · Chenyang Lu · Miao Zhang · Huchuan Lu -
2019 Poster: Memory-oriented Decoder for Light Field Salient Object Detection »
Miao Zhang · Jingjing Li · Wei Ji · Yongri Piao · Huchuan Lu