Timezone: »
Deep neural networks continue to advance the state-of-the-art of image recognition tasks with various methods. However, applications of these methods to multimodality remain limited. We present Multimodal Residual Networks (MRN) for the multimodal residual learning of visual question-answering, which extends the idea of the deep residual learning. Unlike the deep residual learning, MRN effectively learns the joint representation from visual and language information. The main idea is to use element-wise multiplication for the joint residual mappings exploiting the residual learning of the attentional models in recent studies. Various alternative models introduced by multimodality are explored based on our study. We achieve the state-of-the-art results on the Visual QA dataset for both Open-Ended and Multiple-Choice tasks. Moreover, we introduce a novel method to visualize the attention effect of the joint representations for each learning block using back-propagation algorithm, even though the visual features are collapsed without spatial information.
Author Information
Jin-Hwa Kim (Seoul National University)
Sang-Woo Lee (Seoul National University)
Donghyun Kwak (Seoul National University)
Min-Oh Heo (Seoul National University)
Ph.D Student in Seoul National University
Jeonghee Kim (Naver Labs)
Jung-Woo Ha (Naver Labs)
Byoung-Tak Zhang (Seoul National University)
More from the Same Authors
-
2021 : Partition-based Local Independence Discovery »
Inwoo Hwang · Byoung-Tak Zhang · Sanghack Lee -
2021 : C^3: Contrastive Learning for Cross-domain Correspondence in Few-shot Image Generation »
Hyukgi Lee · Gi-Cheon Kang · Chang-Hoon Jeong · Hanwool Sul · Byoung-Tak Zhang -
2022 Poster: Robust Imitation via Mirror Descent Inverse Reinforcement Learning »
Dong-Sig Han · Hyunseo Kim · Hyundo Lee · JeHwan Ryu · Byoung-Tak Zhang -
2022 Poster: SelecMix: Debiased Learning by Contradicting-pair Sampling »
Inwoo Hwang · Sangjun Lee · Yunhyeok Kwak · Seong Joon Oh · Damien Teney · Jin-Hwa Kim · Byoung-Tak Zhang -
2021 Poster: Goal-Aware Cross-Entropy for Multi-Target Reinforcement Learning »
Kibeom Kim · Min Whoo Lee · Yoonsung Kim · JeHwan Ryu · Minsu Lee · Byoung-Tak Zhang -
2020 Workshop: BabyMind: How Babies Learn and How Machines Can Imitate »
Byoung-Tak Zhang · Gary Marcus · Angelo Cangelosi · Pia Knoeferle · Klaus Obermayer · David Vernon · Chen Yu -
2020 : Opening Remarks: BabyMind, Byoung-Tak Zhang and Gary Marcus »
Byoung-Tak Zhang · Gary Marcus -
2018 Poster: Answerer in Questioner's Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog »
Sang-Woo Lee · Yu-Jung Heo · Byoung-Tak Zhang -
2018 Spotlight: Answerer in Questioner's Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog »
Sang-Woo Lee · Yu-Jung Heo · Byoung-Tak Zhang -
2018 Poster: Bilinear Attention Networks »
Jin-Hwa Kim · Jaehyun Jun · Byoung-Tak Zhang -
2017 : Posters and Coffee »
Jean-Baptiste Tristan · Yunseong Lee · Anna Veronika Dorogush · Shohei Hido · Michael Terry · Mennatullah Siam · Hidemoto Nakada · Cody Coleman · Jung-Woo Ha · Hao Zhang · Adam Stooke · Chen Meng · Christopher Kappler · Lane Schwartz · Christopher Olston · Sebastian Schelter · Minmin Sun · Daniel Kang · Waldemar Hummer · Jichan Chung · Tim Kraska · Kannan Ramchandran · Nick Hynes · Christoph Boden · Donghyun Kwak -
2017 Poster: Overcoming Catastrophic Forgetting by Incremental Moment Matching »
Sang-Woo Lee · Jin-Hwa Kim · Jaehyun Jun · Jung-Woo Ha · Byoung-Tak Zhang -
2017 Spotlight: Overcoming Catastrophic Forgetting by Incremental Moment Matching »
Sang-Woo Lee · Jin-Hwa Kim · Jaehyun Jun · Jung-Woo Ha · Byoung-Tak Zhang -
2016 : PororoQA: Cartoon Video Series Dataset for Story Understanding »
KyungMin Kim · Min-Oh Heo · Byoung-Tak Zhang -
2010 Poster: Generative Local Metric Learning for Nearest Neighbor Classification »
Yung-Kyun Noh · Byoung-Tak Zhang · Daniel Lee