Timezone: »
In vision-and-language grounding problems, fine-grained representations of the image are considered to be of paramount importance. Most of the current systems incorporate visual features and textual concepts as a sketch of an image. However, plainly inferred representations are usually undesirable in that they are composed of separate components, the relations of which are elusive. In this work, we aim at representing an image with a set of integrated visual regions and corresponding textual concepts, reflecting certain semantics. To this end, we build the Mutual Iterative Attention (MIA) module, which integrates correlated visual features and textual concepts, respectively, by aligning the two modalities. We evaluate the proposed approach on two representative vision-and-language grounding tasks, i.e., image captioning and visual question answering. In both tasks, the semantic-grounded image representations consistently boost the performance of the baseline models under all metrics across the board. The results demonstrate that our approach is effective and generalizes well to a wide range of models for image-related applications. (The code is available at \url{https://github.com/fenglinliu98/MIA)
Author Information
Fenglin Liu (Peking University)
Yuanxin Liu (Institute of Information Engineering, Chinese Academy of Sciences; SCS, University of Chinese Academy of Sciences)
Xuancheng Ren (Peking University)
Xiaodong He (JD AI research)
Xu Sun (Peking University)
More from the Same Authors
-
2022 : Gradient Knowledge Distillation for Pre-trained Language Models »
Lean Wang · Lei Li · Xu Sun -
2022 : Gradient Knowledge Distillation for Pre-trained Language Models »
Lean Wang · Lei Li · Xu Sun -
2022 Poster: Retrieve, Reason, and Refine: Generating Accurate and Faithful Patient Instructions »
Fenglin Liu · Bang Yang · Chenyu You · Xian Wu · Shen Ge · Zhangdaihong Liu · Xu Sun · Yang Yang · David Clifton -
2021 : Continual Learning in Large-Scale Pre-Training »
Xu Sun -
2021 Poster: Auto-Encoding Knowledge Graph for Unsupervised Medical Report Generation »
Fenglin Liu · Chenyu You · Xian Wu · Shen Ge · Sheng wang · Xu Sun -
2021 Poster: Topology-Imbalance Learning for Semi-Supervised Node Classification »
Deli Chen · Yankai Lin · Guangxiang Zhao · Xuancheng Ren · Peng Li · Jie Zhou · Xu Sun -
2020 Poster: Group Contextual Encoding for 3D Point Clouds »
Xu Liu · Chengtao Li · Jian Wang · Jingbo Wang · Boxin Shi · Xiaodong He -
2020 Poster: Prophet Attention: Predicting Attention with Future Attention »
Fenglin Liu · Xuancheng Ren · Xian Wu · Shen Ge · Wei Fan · Yuexian Zou · Xu Sun -
2019 Poster: Understanding and Improving Layer Normalization »
Jingjing Xu · Xu Sun · Zhiyuan Zhang · Guangxiang Zhao · Junyang Lin -
2014 Poster: Structure Regularization for Structured Prediction »
Xu Sun