Timezone: »
Weakly-supervised vision-language grounding aims to localize a target moment in a video or a specific region in an image according to the given sentence query, where only video-level or image-level sentence annotations are provided during training. Most existing approaches employ the MIL-based or reconstruction-based paradigms for the WSVLG task, but the former heavily depends on the quality of randomly-selected negative samples and the latter cannot directly optimize the visual-textual alignment score. In this paper, we propose a novel Counterfactual Contrastive Learning (CCL) to develop sufficient contrastive training between counterfactual positive and negative results, which are based on robust and destructive counterfactual transformations. Concretely, we design three counterfactual transformation strategies from the feature-, interaction- and relation-level, where the feature-level method damages the visual features of selected proposals, interaction-level approach confuses the vision-language interaction and relation-level strategy destroys the context clues in proposal relationships. Extensive experiments on five vision-language grounding datasets verify the effectiveness of our CCL paradigm.
Author Information
Zhu Zhang (Zhejiang University)
Zhou Zhao (Zhejiang University)
Zhijie Lin (Zhejiang University)
jieming zhu (Huawei Noah''s Ark Lab)
Xiuqiang He (Huawei Noah's Ark Lab)
More from the Same Authors
-
2021 Poster: PortaSpeech: Portable and High-Quality Generative Text-to-Speech »
Yi Ren · Jinglin Liu · Zhou Zhao -
2021 Poster: Generalizable Multi-linear Attention Network »
Tao Jin · Zhou Zhao -
2021 Poster: UFC-BERT: Unifying Multi-Modal Controls for Conditional Image Synthesis »
Zhu Zhang · Jianxin Ma · Chang Zhou · Rui Men · Zhikang Li · Ming Ding · Jie Tang · Jingren Zhou · Hongxia Yang -
2019 Poster: FastSpeech: Fast, Robust and Controllable Text to Speech »
Yi Ren · Yangjun Ruan · Xu Tan · Tao Qin · Sheng Zhao · Zhou Zhao · Tie-Yan Liu -
2018 Poster: MacNet: Transferring Knowledge from Machine Comprehension to Sequence-to-Sequence Models »
Boyuan Pan · Yazheng Yang · Hao Li · Zhou Zhao · Yueting Zhuang · Deng Cai · Xiaofei He