Timezone: »
We present a novel training framework for neural sequence models, particularly for grounded dialog generation. The standard training paradigm for these models is maximum likelihood estimation (MLE), or minimizing the cross-entropy of the human responses. Across a variety of domains, a recurring problem with MLE trained generative neural dialog models (G) is that they tend to produce 'safe' and generic responses like "I don't know", "I can't tell"). In contrast, discriminative dialog models (D) that are trained to rank a list of candidate human responses outperform their generative counterparts; in terms of automatic metrics, diversity, and informativeness of the responses. However, D is not useful in practice since it can not be deployed to have real conversations with users. Our work aims to achieve the best of both worlds -- the practical usefulness of G and the strong performance of D -- via knowledge transfer from D to G. Our primary contribution is an end-to-end trainable generative visual dialog model, where G receives gradients from D as a perceptual (not adversarial) loss of the sequence sampled from G. We leverage the recently proposed Gumbel-Softmax (GS) approximation to the discrete distribution -- specifically, a RNN is augmented with a sequence of GS samplers, which coupled with the straight-through gradient estimator enables end-to-end differentiability. We also introduce a stronger encoder for visual dialog, and employ a self-attention mechanism for answer encoding along with a metric learning loss to aid D in better capturing semantic similarities in answer responses. Overall, our proposed model outperforms state-of-the-art on the VisDial dataset by a significant margin (2.67% on recall@10). The source code can be downloaded from https://github.com/jiasenlu/visDial.pytorch
Author Information
Jiasen Lu (Georgia Tech)
Anitha Kannan
Jianwei Yang (Georgia Tech)
Devi Parikh (Georgia Tech / Facebook AI Research (FAIR))
Dhruv Batra (FAIR (Meta) / Georgia Tech)
More from the Same Authors
-
2021 Spotlight: Habitat 2.0: Training Home Assistants to Rearrange their Habitat »
Andrew Szot · Alexander Clegg · Eric Undersander · Erik Wijmans · Yili Zhao · John Turner · Noah Maestre · Mustafa Mukadam · Devendra Singh Chaplot · Oleksandr Maksymets · Aaron Gokaslan · Vladimír Vondruš · Sameer Dharur · Franziska Meier · Wojciech Galuba · Angel Chang · Zsolt Kira · Vladlen Koltun · Jitendra Malik · Manolis Savva · Dhruv Batra -
2021 : Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI »
Santhosh Kumar Ramakrishnan · Aaron Gokaslan · Erik Wijmans · Oleksandr Maksymets · Alexander Clegg · John Turner · Eric Undersander · Wojciech Galuba · Andrew Westbury · Angel Chang · Manolis Savva · Yili Zhao · Dhruv Batra -
2022 : Fifteen-minute Competition Overview Video »
Dhruv Batra · Manolis Savva · Zsolt Kira · Vincent-Pierre Berges · Karmesh Yadav · Angel Chang · Andrew Szot · Alexander Clegg · Aaron Gokaslan -
2023 Poster: Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence? »
Arjun Majumdar · Karmesh Yadav · Sergio Arnaud · Jason Yecheng Ma · Claire Chen · Sneha Silwal · Aryan Jain · Vincent-Pierre Berges · Tingfan Wu · Jay Vakil · Pieter Abbeel · Jitendra Malik · Dhruv Batra · Yixin Lin · Oleksandr Maksymets · Aravind Rajeswaran · Franziska Meier -
2023 Competition: The HomeRobot Open Vocabulary Mobile Manipulation Challenge »
Sriram Yenamandra · Arun Ramachandran · Mukul Khanna · Karmesh Yadav · Devendra Singh Chaplot · Gunjan Chhablani · Alexander Clegg · Theophile Gervet · Vidhi Jain · Ruslan Partsey · Ram Ramrakhya · Andrew Szot · Austin Wang · Tsung-Yen Yang · Aaron Edsinger · Charles Kemp · Binit Shah · Zsolt Kira · Dhruv Batra · Roozbeh Mottaghi · Yonatan Bisk · Chris Paxton -
2022 Competition: Habitat Rearrangement Challenge »
Andrew Szot · Karmesh Yadav · Alexander Clegg · Vincent-Pierre Berges · Aaron Gokaslan · Angel Chang · Manolis Savva · Zsolt Kira · Dhruv Batra -
2022 Poster: VER: Scaling On-Policy RL Leads to the Emergence of Navigation in Embodied Rearrangement »
Erik Wijmans · Irfan Essa · Dhruv Batra -
2022 Poster: SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning »
Changan Chen · Carl Schissler · Sanchit Garg · Philip Kobernik · Alexander Clegg · Paul Calamia · Dhruv Batra · Philip Robinson · Kristen Grauman -
2022 Poster: ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings »
Arjun Majumdar · Gunjan Aggarwal · Bhavika Devnani · Judy Hoffman · Dhruv Batra -
2021 : Habitat 2.0: Training Home Assistants to Rearrange their Habitat »
Andrew Szot · Alexander Clegg · Eric Undersander · Erik Wijmans · Yili Zhao · Noah Maestre · Mustafa Mukadam · Oleksandr Maksymets · Aaron Gokaslan · Sameer Dharur · Franziska Meier · Wojciech Galuba · Angel Chang · Zsolt Kira · Vladlen Koltun · Jitendra Malik · Manolis Savva · Dhruv Batra -
2021 : Habitat 2.0: Training Home Assistants to Rearrange their Habitat »
Andrew Szot · Alexander Clegg · Eric Undersander · Erik Wijmans · Yili Zhao · Noah Maestre · Mustafa Mukadam · Oleksandr Maksymets · Aaron Gokaslan · Sameer Dharur · Franziska Meier · Wojciech Galuba · Angel Chang · Zsolt Kira · Vladlen Koltun · Jitendra Malik · Manolis Savva · Dhruv Batra -
2021 : AI for Augmenting Human Creativity »
Devi Parikh -
2021 : Career and Life: Panel Discussion - Bo Li, Adriana Romero-Soriano, Devi Parikh, and Emily Denton »
Remi Denton · Devi Parikh · Bo Li · Adriana Romero -
2021 Poster: SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation »
Abhinav Moudgil · Arjun Majumdar · Harsh Agrawal · Stefan Lee · Dhruv Batra -
2021 Poster: Habitat 2.0: Training Home Assistants to Rearrange their Habitat »
Andrew Szot · Alexander Clegg · Eric Undersander · Erik Wijmans · Yili Zhao · John Turner · Noah Maestre · Mustafa Mukadam · Devendra Singh Chaplot · Oleksandr Maksymets · Aaron Gokaslan · Vladimír Vondruš · Sameer Dharur · Franziska Meier · Wojciech Galuba · Angel Chang · Zsolt Kira · Vladlen Koltun · Jitendra Malik · Manolis Savva · Dhruv Batra -
2021 : Open Catalyst Challenge + Q&A »
Abhishek Das · Muhammed Shuaibi · Siddharth Goyal · Adeesh Kolluru · Janice Lan · Aini Palizhati · Anuroop Sriram · Brandon Wood · Aditya Grover · Devi Parikh · Zachary Ulissi · Larry Zitnick -
2021 Poster: Human-Adversarial Visual Question Answering »
Sasha Sheng · Amanpreet Singh · Vedanuj Goswami · Jose Magana · Tristan Thrush · Wojciech Galuba · Devi Parikh · Douwe Kiela -
2020 Poster: Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data »
Michael Cogswell · Jiasen Lu · Rishabh Jain · Stefan Lee · Devi Parikh · Dhruv Batra -
2020 : Discussion Panel: Hugo Larochelle, Finale Doshi-Velez, Devi Parikh, Marc Deisenroth, Julien Mairal, Katja Hofmann, Phillip Isola, and Michael Bowling »
Hugo Larochelle · Finale Doshi-Velez · Marc Deisenroth · Devi Parikh · Julien Mairal · Katja Hofmann · Phillip Isola · Michael Bowling -
2019 Poster: Cross-channel Communication Networks »
Jianwei Yang · Zhile Ren · Chuang Gan · Hongyuan Zhu · Devi Parikh -
2019 Poster: ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks »
Jiasen Lu · Dhruv Batra · Devi Parikh · Stefan Lee -
2019 Poster: RUBi: Reducing Unimodal Biases for Visual Question Answering »
Remi Cadene · Corentin Dancette · Hedi Ben younes · Matthieu Cord · Devi Parikh -
2019 Poster: Chasing Ghosts: Instruction Following as Bayesian State Tracking »
Peter Anderson · Ayush Shrivastava · Devi Parikh · Dhruv Batra · Stefan Lee -
2018 Workshop: Visually grounded interaction and language »
Florian Strub · Harm de Vries · Erik Wijmans · Samyak Datta · Ethan Perez · Mateusz Malinowski · Stefan Lee · Peter Anderson · Aaron Courville · Jeremie MARY · Dhruv Batra · Devi Parikh · Olivier Pietquin · Chiori HORI · Tim Marks · Anoop Cherian -
2017 : Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Mode »
Jiasen Lu -
2017 : Morning panel discussion »
Jürgen Schmidhuber · Noah Goodman · Anca Dragan · Pushmeet Kohli · Dhruv Batra -
2017 : Invited Talk 2 »
Dhruv Batra -
2017 : Panel Discussion »
Felix Hill · Olivier Pietquin · Jack Gallant · Raymond Mooney · Sanja Fidler · Chen Yu · Devi Parikh -
2017 : Towards Embodied Question Answering »
Devi Parikh -
2017 Workshop: Visually grounded interaction and language »
Florian Strub · Harm de Vries · Abhishek Das · Satwik Kottur · Stefan Lee · Mateusz Malinowski · Olivier Pietquin · Devi Parikh · Dhruv Batra · Aaron Courville · Jeremie Mary -
2016 Poster: Hierarchical Question-Image Co-Attention for Visual Question Answering »
Jiasen Lu · Jianwei Yang · Dhruv Batra · Devi Parikh -
2011 Poster: Understanding the Intrinsic Memorability of Images »
Phillip Isola · Devi Parikh · Antonio Torralba · Aude Oliva