Workshop

Visually grounded interaction and language

Florian Strub · Harm de Vries · Abhishek Das · Satwik Kottur · Stefan Lee · Mateusz Malinowski · Olivier Pietquin · Devi Parikh · Dhruv Batra · Aaron Courville · Jeremie Mary

Project Page

Abstract

Everyday interactions require a common understanding of language, i.e. for people to communicate effectively, words (for example ‘cat’) should invoke similar beliefs over physical concepts (what cats look like, the sounds they make, how they behave, what their skin feels like etc.). However, how this ‘common understanding’ emerges is still unclear.

One appealing hypothesis is that language is tied to how we interact with the environment. As a result, meaning emerges by ‘grounding’ language in modalities in our environment (images, sounds, actions, etc.).

Recent concurrent works in machine learning have focused on bridging visual and natural language understanding through visually-grounded language learning tasks, e.g. through natural images (Visual Question Answering, Visual Dialog), or through interactions with virtual physical environments. In cognitive science, progress in fMRI enables creating a semantic atlas of the cerebral cortex, or to decode semantic information from visual input. And in psychology, recent studies show that a baby’s most likely first words are based on their visual experience, laying the foundation for a new theory of infant language acquisition and learning.

As the grounding problem requires an interdisciplinary attitude, this workshop aims to gather researchers with broad expertise in various fields — machine learning, computer vision, natural language, neuroscience, and psychology — to discuss their cutting edge work as well as perspectives on future directions in this exciting space of grounding and interactions.

We will accept papers related to:
— language acquisition or learning through interactions
— visual captioning, dialog, and question-answering
— reasoning in language and vision
— visual synthesis from language
— transfer learning in language and vision tasks
— navigation in virtual worlds with natural-language instructions
— machine translation with visual cues
— novel tasks that combine language, vision and actions
— understanding and modeling the relationship between language and vision in humans
— semantic systems and modeling of natural language and visual stimuli representations in the human brain

Important dates
---------------------
Submission deadline: 3rd November 2017
Extended Submission deadline: 17th November 2017

Acceptance notification (First deadline): 10th November 2017
Acceptance notification (Second deadline): 24th November 2017

Workshop: 8th December 2017

Paper details
------------------
— Contributed papers may include novel research, preliminary results, extended abstract, positional papers or surveys
— Papers are limited to 4 pages, excluding references, in the latest camera-ready NIPS format: https://nips.cc/Conferences/2017/PaperInformation/StyleFiles
— Papers published at the main conference can be submitted without reformatting
— Please submit via email: nips2017vigil@gmail.com

Accepted papers
-----------------------
— All accepted papers will be presented during 2 poster sessions
— Up to 5 accepted papers will be invited to deliver short talks
— Accepted papers will be made publicly available as non-archival reports, allowing future submissions to archival conferences and journals

Invited Speakers
-----------------------
Raymond J. Mooney - University of Texas
Sanja Fidler - University of Toronto
Olivier Pietquin - DeepMind
Jack Gallant - University of Berkeley
Devi Parikh - Georgia Tech / FAIR
Felix Hill - DeepMind
Jack Gallant - Univeristy of Berkeley
Chen Yu - University of Indiana

Video

Chat is not available.

Schedule

Timezone: America/Los_Angeles

8:30 AM

Welcome!

8:45 AM

Visually Grounded Language: Past, Present, and Future...

Raymond Mooney

Link

9:30 AM

Connecting high-level semantics with low-level vision

Sanja Fidler

Link

10:15 AM

Break + Poster (1)

Devendra Singh Chaplot · CHIH-YAO MA · Simon Brodeur · Eri Matsuo · Ichiro Kobayashi · Seitaro Shinagawa · Koichiro Yoshino · Yuhong Guo · Ben Murdoch · Kanthashree Mysore Sathyendra · Daniel Ricks · Haichao Zhang · Joshua Peterson · Li Zhang · Mircea Mironenco · Peter Anderson · Mark Johnson · Kang Min Yoo · Guntis Barzdins · Ahmed H Zaidi · Martin Andrews · Sam Witteveen · SUBBAREDDY OOTA · Prashanth Vijayaraghavan · Ke Wang · Yan Zhu · Renars Liepins · Max Quinn · Amit Raj · Vincent Cartillier · Eric Chu · Ethan Caballero · Fritz Obermeyer

10:40 AM

The interface between vision and language in the human brain?

Jack Gallant

Link

11:25 AM

Towards Embodied Question Answering

Devi Parikh

Link

2:00 PM

Dialogue systems and RL: interconnecting language, vision and rewards

Olivier Pietquin

Link