Timezone: »
Referring image segmentation aims at localizing all pixels of the visual objects described by a natural language sentence. Previous works learn to straightforwardly align the sentence embedding and pixel-level embedding for highlighting the referred objects, but ignore the semantic consistency of pixels within the same object, leading to incomplete masks and localization errors in predictions. To tackle this problem, we propose CoupAlign, a simple yet effective multi-level visual-semantic alignment method, to couple sentence-mask alignment with word-pixel alignment to enforce object mask constraint for achieving more accurate localization and segmentation. Specifically, the Word-Pixel Alignment (WPA) module performs early fusion of linguistic and pixel-level features in intermediate layers of the vision and language encoders. Based on the word-pixel aligned embedding, a set of mask proposals are generated to hypothesize possible objects. Then in the Sentence-Mask Alignment (SMA) module, the masks are weighted by the sentence embedding to localize the referred object, and finally projected back to aggregate the pixels for the target. To further enhance the learning of the two alignment modules, an auxiliary loss is designed to contrast the foreground and background pixels. By hierarchically aligning pixels and masks with linguistic features, our CoupAlign captures the pixel coherence at both visual and semantic levels, thus generating more accurate predictions. Extensive experiments on popular datasets (e.g., RefCOCO and G-Ref) show that our method achieves consistent improvements over state-of-the-art methods, e.g., about 2% oIoU increase on the validation and testing set of RefCOCO. Especially, CoupAlign has remarkable ability in distinguishing the target from multiple objects of the same class. Code will be available at https://gitee.com/mindspore/models/tree/master/research/cv/CoupAlign.
Author Information
Zicheng Zhang (Xi'an Jiaotong University)
Yi Zhu (University of Chinese Academy of Sciences)
Jianzhuang Liu (Huawei Noah's Ark Lab)
Xiaodan Liang (Sun Yat-sen University)
Wei Ke (Xi'an Jiaotong University)
More from the Same Authors
-
2021 : One Million Scenes for Autonomous Driving: ONCE Dataset »
Jiageng Mao · Niu Minzhe · ChenHan Jiang · hanxue liang · Jingheng Chen · Xiaodan Liang · Yamin Li · Chaoqiang Ye · Wei Zhang · Zhenguo Li · Jie Yu · Hang Xu · Chunjing XU -
2021 : FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark »
Mingjie Li · Wenjia Cai · Rui Liu · Yuetian Weng · Xiaoyun Zhao · Cong Wang · Xin Chen · Zhong Liu · Caineng Pan · Mengke Li · yingfeng zheng · Yizhi Liu · Flora Salim · Karin Verspoor · Xiaodan Liang · Xiaojun Chang -
2021 : IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning »
Pan Lu · Liang Qiu · Jiaqi Chen · Tanglin Xia · Yizhou Zhao · Wei Zhang · Zhou Yu · Xiaodan Liang · Song-Chun Zhu -
2021 : SODA10M: A Large-Scale 2D Self/Semi-Supervised Object Detection Dataset for Autonomous Driving »
Jianhua Han · Xiwen Liang · Hang Xu · Kai Chen · Lanqing Hong · Jiageng Mao · Chaoqiang Ye · Wei Zhang · Zhenguo Li · Xiaodan Liang · Chunjing XU -
2021 : Theorem-Aware Geometry Problem Solving with Symbolic Reasoning and Theorem Prediction »
Pan Lu · Ran Gong · Shibiao Jiang · Liang Qiu · Siyuan Huang · Xiaodan Liang · Song-Chun Zhu · Ran Gong -
2021 : Towards Diagram Understanding and Cognitive Reasoning in Icon Question Answering »
Pan Lu · Liang Qiu · Jiaqi Chen · Tanglin Xia · Yizhou Zhao · Wei Zhang · Zhou Yu · Xiaodan Liang · Song-Chun Zhu -
2021 : Geometric Question Answering Towards Multimodal Numerical Reasoning »
Jiaqi Chen · Jianheng Tang · Jinghui Qin · Xiaodan Liang · Lingbo Liu · Eric Xing · Liang Lin -
2022 Poster: Towards Hard-pose Virtual Try-on via 3D-aware Global Correspondence Learning »
Zaiyu Huang · Hanhui Li · Zhenyu Xie · Michael Kampffmeyer · qingling Cai · Xiaodan Liang -
2022 Poster: FNeVR: Neural Volume Rendering for Face Animation »
Bohan Zeng · Boyu Liu · Hong Li · Xuhui Liu · Jianzhuang Liu · Dapeng Chen · Wei Peng · Baochang Zhang -
2023 Poster: RIO: A Benchmark for Reasoning Intention-Oriented Objects in Open Environments »
Mengxue Qu · Yu Wu · Wu Liu · Xiaodan Liang · Jingkuan Song · Yao Zhao · Yunchao Wei -
2022 Spotlight: Lightning Talks 4B-3 »
Zicheng Zhang · Mancheng Meng · Antoine Guedon · Yue Wu · Wei Mao · Zaiyu Huang · Peihao Chen · Shizhe Chen · Yongwei Chen · Keqiang Sun · Yi Zhu · chen rui · Hanhui Li · Dongyu Ji · Ziyan Wu · miaomiao Liu · Pascal Monasse · Yu Deng · Shangzhe Wu · Pierre-Louis Guhur · Jiaolong Yang · Kunyang Lin · Makarand Tapaswi · Zhaoyang Huang · Terrence Chen · Jiabao Lei · Jianzhuang Liu · Vincent Lepetit · Zhenyu Xie · Richard I Hartley · Dinggang Shen · Xiaodan Liang · Runhao Zeng · Cordelia Schmid · Michael Kampffmeyer · Mathieu Salzmann · Ning Zhang · Fangyun Wei · Yabin Zhang · Fan Yang · Qifeng Chen · Wei Ke · Quan Wang · Thomas Li · qingling Cai · Kui Jia · Ivan Laptev · Mingkui Tan · Xin Tong · Hongsheng Li · Xiaodan Liang · Chuang Gan -
2022 Spotlight: Towards Hard-pose Virtual Try-on via 3D-aware Global Correspondence Learning »
Zaiyu Huang · Hanhui Li · Zhenyu Xie · Michael Kampffmeyer · qingling Cai · Xiaodan Liang -
2022 Spotlight: CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation »
Zicheng Zhang · Yi Zhu · Jianzhuang Liu · Xiaodan Liang · Wei Ke -
2022 Poster: Structure-Preserving 3D Garment Modeling with Neural Sewing Machines »
Xipeng Chen · Guangrun Wang · Dizhong Zhu · Xiaodan Liang · Philip Torr · Liang Lin -
2022 Poster: Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark »
Jiaxi Gu · Xiaojun Meng · Guansong Lu · Lu Hou · Niu Minzhe · Xiaodan Liang · Lewei Yao · Runhui Huang · Wei Zhang · Xin Jiang · Chunjing XU · Hang Xu -
2022 Poster: DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection »
Lewei Yao · Jianhua Han · Youpeng Wen · Xiaodan Liang · Dan Xu · Wei Zhang · Zhenguo Li · Chunjing XU · Hang Xu -
2022 Poster: Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving »
Xiwen Liang · Yangxin Wu · Jianhua Han · Hang Xu · Chunjing XU · Xiaodan Liang -
2021 Workshop: Math AI for Education (MATHAI4ED): Bridging the Gap Between Research and Smart Education »
Pan Lu · Yuhuai Wu · Sean Welleck · Xiaodan Liang · Eric Xing · James McClelland -
2021 Poster: Towards Scalable Unpaired Virtual Try-On via Patch-Routed Spatially-Adaptive GAN »
Zhenyu Xie · Zaiyu Huang · Fuwei Zhao · Haoye Dong · Michael Kampffmeyer · Xiaodan Liang -
2020 Poster: Self-Adaptively Learning to Demoiré from Focused and Defocused Image Pairs »
Lin Liu · Shanxin Yuan · Jianzhuang Liu · Liping Bao · Gregory Slabaugh · Qi Tian -
2020 Poster: AutoSync: Learning to Synchronize for Data-Parallel Distributed Deep Learning »
Hao Zhang · Yuan Li · Zhijie Deng · Xiaodan Liang · Lawrence Carin · Eric Xing -
2020 Poster: Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation »
Yangxin Wu · Gengwei Zhang · Hang Xu · Xiaodan Liang · Liang Lin -
2020 Poster: Towards Interpretable Natural Language Understanding with Explanations as Latent Variables »
Wangchunshu Zhou · Jinyi Hu · Hanlin Zhang · Xiaodan Liang · Maosong Sun · Chenyan Xiong · Jian Tang -
2019 Poster: Heterogeneous Graph Learning for Visual Commonsense Reasoning »
Weijiang Yu · Jingwen Zhou · Weihao Yu · Xiaodan Liang · Nong Xiao -
2019 Spotlight: Heterogeneous Graph Learning for Visual Commonsense Reasoning »
Weijiang Yu · Jingwen Zhou · Weihao Yu · Xiaodan Liang · Nong Xiao -
2018 Poster: Symbolic Graph Reasoning Meets Convolutions »
Xiaodan Liang · Zhiting Hu · Hao Zhang · Liang Lin · Eric Xing -
2018 Poster: Deep Generative Models with Learnable Knowledge Constraints »
Zhiting Hu · Zichao Yang · Russ Salakhutdinov · LIANHUI Qin · Xiaodan Liang · Haoye Dong · Eric Xing -
2018 Poster: Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation »
Yuan Li · Xiaodan Liang · Zhiting Hu · Eric Xing -
2018 Poster: Hybrid Knowledge Routed Modules for Large-scale Object Detection »
ChenHan Jiang · Hang Xu · Xiaodan Liang · Liang Lin -
2018 Poster: Soft-Gated Warping-GAN for Pose-Guided Person Image Synthesis »
Haoye Dong · Xiaodan Liang · Ke Gong · Hanjiang Lai · Jia Zhu · Jian Yin -
2017 Poster: Structured Generative Adversarial Networks »
Zhijie Deng · Hao Zhang · Xiaodan Liang · Luona Yang · Shizhen Xu · Jun Zhu · Eric Xing -
2016 Poster: Tree-Structured Reinforcement Learning for Sequential Object Localization »
Zequn Jie · Xiaodan Liang · Jiashi Feng · Xiaojie Jin · Wen Lu · Shuicheng Yan