`

Timezone: »

 
Poster
Variational Structured Semantic Inference for Diverse Image Captioning
Fuhai Chen · Rongrong Ji · Jiayi Ji · Xiaoshuai Sun · Baochang Zhang · Xuri Ge · Yongjian Wu · Feiyue Huang · Yan Wang

Wed Dec 11 05:00 PM -- 07:00 PM (PST) @ East Exhibition Hall B + C #118

Despite the exciting progress in image captioning, generating diverse captions for a given image remains as an open problem. Existing methods typically apply generative models such as Variational Auto-Encoder to diversify the captions, which however neglect two key factors of diverse expression, i.e., the lexical diversity and the syntactic diversity. To model these two inherent diversities in image captioning, we propose a Variational Structured Semantic Inferring model (termed VSSI-cap) executed in a novel structured encoder-inferer-decoder schema. VSSI-cap mainly innovates in a novel structure, i.e., Variational Multi-modal Inferring tree (termed VarMI-tree). In particular, conditioned on the visual-textual features from the encoder, the VarMI-tree models the lexical and syntactic diversities by inferring their latent variables (with variations) in an approximate posterior inference guided by a visual semantic prior. Then, a reconstruction loss and the posterior-prior KL-divergence are jointly estimated to optimize the VSSI-cap model. Finally, diverse captions are generated upon the visual features and the latent variables from this structured encoder-inferer-decoder model. Experiments on the benchmark dataset show that the proposed VSSI-cap achieves significant improvements over the state-of-the-arts.

Author Information

Fuhai Chen (Xiamen University)

Fuhai Chen is currently a final-year Ph.D. student in Artificial Intelligence Department of Xiamen University, advised by Prof. Rongrong Ji. He received the B.S. Degree in Cognitive Science and Technology from Xiamen University in 2014. He obtained the M.S.-Ph.D qualification and finished his M.S. in Xiamen University in 2016. His research interests are in Computer Vision, Multimedia and Machine Learning. He is now finding the postdoc position.

Rongrong Ji (Xiamen University, China)
Jiayi Ji (Xiamen University)
Xiaoshuai Sun (Xiamen University)
Baochang Zhang (Beihang University)
Xuri Ge (Xiamen University)
Yongjian Wu (Tencent Technology (Shanghai) Co.,Ltd)
Feiyue Huang (Tencent)
Yan Wang (Microsoft)

More from the Same Authors