Timezone: »
Recently, attention based models have been used extensively in many sequence-to-sequence learning systems. Especially for image captioning, the attention based models are expected to ground correct image regions with proper generated words. However, for each time step in the decoding process, the attention based models usually use the hidden state of the current input to attend to the image regions. Under this setting, these attention models have a deviated focus'' problem that they calculate the attention weights based on previous words instead of the one to be generated, impairing the performance of both grounding and captioning. In this paper, we propose the Prophet Attention, similar to the form of self-supervision. In the training stage, this module utilizes the future information to calculate the
ideal'' attention weights towards image regions. These calculated ideal'' weights are further used to regularize the
deviated'' attention. In this manner, image regions are grounded with the correct words. The proposed Prophet Attention can be easily incorporated into existing image captioning models to improve their performance of both grounding and captioning. The experiments on the Flickr30k Entities and the MSCOCO datasets show that the proposed Prophet Attention consistently outperforms baselines in both automatic metrics and human evaluations. It is worth noticing that we set new state-of-the-arts on the two benchmark datasets and achieve the 1st place on the leaderboard of the online MSCOCO benchmark in terms of the default ranking score, i.e., CIDEr-c40.
Author Information
Fenglin Liu (Peking University)
Xuancheng Ren (Peking University)
Xian Wu (Tencent)
Shen Ge (Tencent Medical AI Lab)
Wei Fan (Tencent)
Yuexian Zou (Peking University)
Xu Sun (Peking University)
More from the Same Authors
-
2022 Poster: Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations »
Peng Jin · Jinfa Huang · Fenglin Liu · Xian Wu · Shen Ge · Guoli Song · David Clifton · Jie Chen -
2022 : Gradient Knowledge Distillation for Pre-trained Language Models »
Lean Wang · Lei Li · Xu Sun -
2023 Poster: Theoretically Modeling Client Data Divergence for Federated Natural Language Backdoor Defense »
Zhiyuan Zhang · Deli Chen · Hao Zhou · Fandong Meng · Jie Zhou · Xu Sun -
2023 Poster: Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition »
Shuhuai Ren · Aston Zhang · Yi Zhu · Shuai Zhang · Shuai Zheng · Mu Li · Alexander Smola · Xu Sun -
2023 Poster: FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation »
Yuanxin Liu · Lei Li · Shuhuai Ren · Rundong Gao · Shicheng Li · Sishuo Chen · Xu Sun · Lu Hou -
2022 Spotlight: Lightning Talks 6B-3 »
Lingfeng Yang · Yao Lai · Zizheng Pan · Zhenyu Wang · Weicong Liang · Chuanyang Zheng · Jian-Wei Zhang · Peng Jin · Jing Liu · Xiuying Wei · Yao Mu · Xiang Li · YUHUI YUAN · Zizheng Pan · Yifan Sun · Yunchen Zhang · Jianfei Cai · Hao Luo · zheyang li · Jinfa Huang · Haoyu He · Yi Yang · Ping Luo · Fenglin Liu · Henghui Ding · Borui Zhao · Xiangguo Zhang · Kai Zhang · Pichao WANG · Bohan Zhuang · Wei Chen · Ruihao Gong · Zhi Yang · Xian Wu · Feng Ding · Jianfei Cai · Xiao Luo · Renjie Song · Weihong Lin · Jian Yang · Wenming Tan · Bohan Zhuang · Shanghang Zhang · Shen Ge · Fan Wang · Qi Zhang · Guoli Song · Jun Xiao · Hao Li · Ding Jia · David Clifton · Ye Ren · Fengwei Yu · Zheng Zhang · Jie Chen · Shiliang Pu · Xianglong Liu · Chao Zhang · Han Hu -
2022 Spotlight: Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations »
Peng Jin · Jinfa Huang · Fenglin Liu · Xian Wu · Shen Ge · Guoli Song · David Clifton · Jie Chen -
2022 : Gradient Knowledge Distillation for Pre-trained Language Models »
Lean Wang · Lei Li · Xu Sun -
2022 Poster: Retrieve, Reason, and Refine: Generating Accurate and Faithful Patient Instructions »
Fenglin Liu · Bang Yang · Chenyu You · Xian Wu · Shen Ge · Zhangdaihong Liu · Xu Sun · Yang Yang · David Clifton -
2021 : Continual Learning in Large-Scale Pre-Training »
Xu Sun -
2021 Poster: Auto-Encoding Knowledge Graph for Unsupervised Medical Report Generation »
Fenglin Liu · Chenyu You · Xian Wu · Shen Ge · Sheng wang · Xu Sun -
2021 Poster: Topology-Imbalance Learning for Semi-Supervised Node Classification »
Deli Chen · Yankai Lin · Guangxiang Zhao · Xuancheng Ren · Peng Li · Jie Zhou · Xu Sun -
2019 Poster: Understanding and Improving Layer Normalization »
Jingjing Xu · Xu Sun · Zhiyuan Zhang · Guangxiang Zhao · Junyang Lin -
2019 Poster: Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations »
Fenglin Liu · Yuanxin Liu · Xuancheng Ren · Xiaodong He · Xu Sun -
2014 Poster: Structure Regularization for Structured Prediction »
Xu Sun