Timezone: »

Grounding Spatio-Temporal Language with Transformers
Tristan Karch · Laetitia Teodorescu · Katja Hofmann · Clément Moulin-Frier · Pierre-Yves Oudeyer

Tue Dec 07 08:30 AM -- 10:00 AM (PST) @

Language is an interface to the outside world. In order for embodied agents to use it, language must be grounded in other, sensorimotor modalities. While there is an extended literature studying how machines can learn grounded language, the topic of how to learn spatio-temporal linguistic concepts is still largely uncharted. To make progress in this direction, we here introduce a novel spatio-temporal language grounding task where the goal is to learn the meaning of spatio-temporal descriptions of behavioral traces of an embodied agent. This is achieved by training a truth function that predicts if a description matches a given history of observations. The descriptions involve time-extended predicates in past and present tense as well as spatio-temporal references to objects in the scene. To study the role of architectural biases in this task, we train several models including multimodal Transformer architectures; the latter implement different attention computations between words and objects across space and time. We test models on two classes of generalization: 1) generalization to new sentences, 2) generalization to grammar primitives. We observe that maintaining object identity in the attention computation of our Transformers is instrumental to achieving good performance on generalization overall, and that summarizing object traces in a single token has little influence on performance. We then discuss how this opens new perspectives for language-guided autonomous embodied agents.

Author Information

Tristan Karch (Inria)
Laetitia Teodorescu (Inria)
Katja Hofmann (Microsoft Research)

Dr. Katja Hofmann is a Principal Researcher at the [Game Intelligence](http://aka.ms/gameintelligence/) group at [Microsoft Research Cambridge, UK](https://www.microsoft.com/en-us/research/lab/microsoft-research-cambridge/). There, she leads a research team that focuses on reinforcement learning with applications in modern video games. She and her team strongly believe that modern video games will drive a transformation of how we interact with AI technology. One of the projects developed by her team is [Project Malmo](https://www.microsoft.com/en-us/research/project/project-malmo/), which uses the popular game Minecraft as an experimentation platform for developing intelligent technology. Katja's long-term goal is to develop AI systems that learn to collaborate with people, to empower their users and help solve complex real-world problems. Before joining Microsoft Research, Katja completed her PhD in Computer Science as part of the [ILPS](https://ilps.science.uva.nl/) group at the [University of Amsterdam](https://www.uva.nl/en). She worked with Maarten de Rijke and Shimon Whiteson on interactive machine learning algorithms for search engines.

Clément Moulin-Frier (Universitat Pompeu Fabra)

I obtained my PhD in Engineering of Cognition, Interaction, Learning and Creation from Grenoble University in 2011. My main research interest is the modeling of social behavior formation in robotic agents, studying the role of cognitive, morphological, sensori-motor and environmental factors. In 2009 I have been a visiting scholar in Michael Arbib's lab at the University of Southern California in Los Angeles, USA. After a short contract at the College de France in Paris where I worked on probabilist optimal control for bidedal robots, I conducted my research between 2011 and 2014 in the Flowers group at Inria Bordeaux, in the field of developmental robotics. Since 2015 I have been working as a post-doctoral researcher in the Synthetic, Perceptive, Emotive and Cognitive Systems laboratory (SPECS), in particular on social robotics european projects such as WYSIWYD [1] and SocSMCs [2]. [1] http://wysiwyd.upf.edu [2] http://socsmcs.eu

Pierre-Yves Oudeyer (INRIA)

More from the Same Authors