Workshop

Visually Grounded Interaction and Language

Florian Strub ⋅ Abhishek Das ⋅ Erik Wijmans ⋅ Harm de Vries ⋅ Stefan Lee ⋅ Alane Suhr ⋅ Dor Arad Hudson

Project Page

Abstract

The dominant paradigm in modern natural language understanding is learning statistical language models from text-only corpora. This approach is founded on a distributional notion of semantics, i.e. that the ''meaning'' of a word is based only on its relationship to other words. While effective for many applications, this approach suffers from limited semantic understanding -- symbols learned this way lack any concrete groundings into the multimodal, interactive environment in which communication takes place. The symbol grounding problem first highlighted this limitation, that ``meaningless symbols (i.e. words) cannot be grounded in anything but other meaningless symbols''.

On the other hand, humans acquire language by communicating about and interacting within a rich, perceptual environment -- providing concrete groundings, e.g. to objects or concepts either physical or psychological. Thus, recent works have aimed to bridge computer vision, interactive learning, and natural language understanding through language learning tasks based on natural images or through embodied agents performing interactive tasks in physically simulated environments, often drawing on the recent successes of deep learning and reinforcement learning. We believe these lines of research pose a promising approach for building models that do grasp the world's underlying complexity.

The goal of this third ViGIL workshop is to bring together scientists from various backgrounds - machine learning, computer vision, natural language processing, neuroscience, cognitive science, psychology, and philosophy - to share their perspectives on grounding, embodiment, and interaction. By providing this opportunity for cross-discipline discussion, we hope to foster new ideas about how to learn and leverage grounding in machines as well as build new bridges between the science of human cognition and machine learning.

Chat is not available.

Schedule

Timezone: America/Los_Angeles

8:20 AM

Opening Remarks

Florian Strub ⋅ Harm de Vries ⋅ Abhishek Das ⋅ Stefan Lee ⋅ Erik Wijmans ⋅ Dor Arad Hudson ⋅ Alane Suhr

8:30 AM

Grasping Language

Jason Baldridge

9:10 AM

From Human Language to Agent Action

Jesse Thomason

9:50 AM

Coffee Break

10:30 AM

Spotlight

10:50 AM

Why language understanding is not a solved problem

James McClelland

11:30 AM

Louis-Philippe Morency

12:10 PM

Poster session

Candace Ross ⋅ Yassine Mrabet ⋅ Sanjay Subramanian ⋅ Geoffrey Cideron ⋅ Jesse Mu ⋅ Suvrat Bhooshan ⋅ Eda Okur Kavil ⋅ Jean-Benoit Delbrouck ⋅ Yen-Ling Kuo ⋅ Nicolas Lair ⋅ Gabriel Ilharco ⋅ T.S. Jayram ⋅ Alba María Herrera Palacio ⋅ Chihiro Fujiyama ⋅ Olivier Tieleman ⋅ Anna Potapenko ⋅ Guan-Lin Chao ⋅ Thomas Sutter ⋅ Olga Kovaleva ⋅ Farley Lai ⋅ Xin Wang ⋅ Vasu Sharma ⋅ Catalina Cangea ⋅ Nikhil Krishnaswamy ⋅ Yuta Tsuboi ⋅ Alexander Kuhnle ⋅ Khanh Nguyen ⋅ Dian Yu ⋅ Homagni Saha ⋅ Jiannan Xiang ⋅ Vijay Venkataraman ⋅ Ankita Kalra ⋅ Ning Xie ⋅ Derek Doran ⋅ Travis Goodwin ⋅ Asim Kadav ⋅ Shabnam Daghaghi ⋅ Jason Baldridge ⋅ Jialin Wu ⋅ Jingxiang Lin ⋅ Unnat Jain

1:50 PM

Lisa Anne Hendricks

2:30 PM

Linda Smith

3:10 PM

Poster Session

4:00 PM

Timothy Lillicrap

4:40 PM

Josh Tenenbaum

5:20 PM

Panel Discussion

Linda Smith ⋅ Josh Tenenbaum ⋅ Lisa Anne Hendricks ⋅ James McClelland ⋅ Timothy Lillicrap ⋅ Jesse Thomason ⋅ Jason Baldridge ⋅ Louis-Philippe Morency

6:00 PM

Closing Remarks