Timezone: »
Mainstream captioning models often follow a sequential structure to generate cap- tions, leading to issues such as introduction of irrelevant semantics, lack of diversity in the generated captions, and inadequate generalization performance. In this paper, we present an alternative paradigm for image captioning, which factorizes the captioning procedure into two stages: (1) extracting an explicit semantic representation from the given image; and (2) constructing the caption based on a recursive compositional procedure in a bottom-up manner. Compared to conventional ones, our paradigm better preserves the semantic content through an explicit factorization of semantics and syntax. By using the compositional generation procedure, caption construction follows a recursive structure, which naturally fits the properties of human language. Moreover, the proposed compositional procedure requires less data to train, generalizes better, and yields more diverse captions.
Author Information
Bo Dai (The Chinese University of Hong Kong)
Sanja Fidler (University of Toronto)
Dahua Lin (The Chinese University of Hong Kong)
More from the Same Authors
-
2020 Poster: Variational Amodal Object Completion »
Huan Ling · David Acuna · Karsten Kreis · Seung Wook Kim · Sanja Fidler -
2020 Poster: Learning Deformable Tetrahedral Meshes for 3D Reconstruction »
Jun Gao · Wenzheng Chen · Tommy Xiang · Alec Jacobson · Morgan McGuire · Sanja Fidler -
2019 Poster: Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer »
Wenzheng Chen · Huan Ling · Jun Gao · Edward Smith · Jaakko Lehtinen · Alec Jacobson · Sanja Fidler -
2019 Poster: Policy Continuation with Hindsight Inverse Dynamics »
Hao Sun · Zhizhong Li · Xiaotong Liu · Bolei Zhou · Dahua Lin -
2019 Spotlight: Policy Continuation with Hindsight Inverse Dynamics »
Hao Sun · Zhizhong Li · Xiaotong Liu · Bolei Zhou · Dahua Lin -
2019 Demonstration: Toronto Annotation Suite »
Amlan Kar · Sanja Fidler · Jun Gao · Seung Wook Kim · huan ling -
2018 Poster: Trajectory Convolution for Action Recognition »
Yue Zhao · Yuanjun Xiong · Dahua Lin -
2017 Poster: Contrastive Learning for Image Captioning »
Bo Dai · Dahua Lin -
2017 Poster: Teaching Machines to Describe Images with Natural Language Feedback »
huan ling · Sanja Fidler -
2016 Poster: Proximal Deep Structured Models »
Shenlong Wang · Sanja Fidler · Raquel Urtasun -
2015 Poster: Skip-Thought Vectors »
Jamie Kiros · Yukun Zhu · Russ Salakhutdinov · Richard Zemel · Raquel Urtasun · Antonio Torralba · Sanja Fidler -
2015 Poster: 3D Object Proposals for Accurate Object Class Detection »
Xiaozhi Chen · Kaustav Kundu · Yukun Zhu · Andrew G Berneshawi · Huimin Ma · Sanja Fidler · Raquel Urtasun -
2013 Poster: Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation »
Dahua Lin -
2012 Poster: Coupling Nonparametric Mixtures via Latent Dirichlet Processes »
Dahua Lin · John Fisher III -
2012 Poster: 3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model »
Sanja Fidler · Sven Dickinson · Raquel Urtasun -
2012 Spotlight: 3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model »
Sanja Fidler · Sven Dickinson · Raquel Urtasun -
2010 Oral: Construction of Dependent Dirichlet Processes based on Poisson Processes »
Dahua Lin · Eric Grimson · John Fisher III -
2010 Poster: Construction of Dependent Dirichlet Processes based on Poisson Processes »
Dahua Lin · Eric Grimson · John Fisher III -
2009 Poster: Evaluating multi-class learning strategies in a generative hierarchical framework for object detection »
Sanja Fidler · Marko Boben · Ales Leonardis