Timezone: »
The ability to compositionally map language to referents, relations, and actions is an essential component of language understanding. The recent gSCAN dataset (Ruis et al. 2020, NeurIPS) is an inspiring attempt to assess the capacity of models to learn this kind of grounding in scenarios involving navigational instructions. However, we show that gSCAN's highly constrained design means that it does not require compositional interpretation and that many details of its instructions and scenarios are not required for task success. To address these limitations, we propose ReaSCAN, a benchmark dataset that builds off gSCAN but requires compositional language interpretation and reasoning about entities and relations. We assess two models on ReaSCAN: a multi-modal baseline and a state-of-the-art graph convolutional neural model. These experiments show that ReaSCAN is substantially harder than gSCAN for both neural architectures. This suggests that ReaSCAN can serve as a valuable benchmark for advancing our understanding of models' compositional generalization and reasoning capabilities.
Author Information
Zhengxuan Wu (Stanford University)
Elisa Kreiss (Stanford University)
Desmond Ong (National University of Singapore)
Christopher Potts (Stanford University)
More from the Same Authors
-
2021 Spotlight: Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval »
Omar Khattab · Christopher Potts · Matei Zaharia -
2023 Poster: Interpretability at Scale: Identifying Causal Mechanisms in Alpaca »
Zhengxuan Wu · Atticus Geiger · Christopher Potts · Noah Goodman -
2022 Poster: CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior »
Eldar D Abraham · Karel D'Oosterlinck · Amir Feder · Yair Gat · Atticus Geiger · Christopher Potts · Roi Reichart · Zhengxuan Wu -
2021 : Intuitive Image Descriptions are Context-Sensitive »
Shayan Hooshmand · Elisa Kreiss · Christopher Potts -
2021 Poster: Causal Abstractions of Neural Networks »
Atticus Geiger · Hanson Lu · Thomas Icard · Christopher Potts -
2021 Poster: Decrypting Cryptic Crosswords: Semantically Complex Wordplay Puzzles as a Target for NLP »
Josh Rozner · Christopher Potts · Kyle Mahowald -
2021 Poster: Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking »
Zhiyi Ma · Kawin Ethayarajh · Tristan Thrush · Somya Jain · Ledell Wu · Robin Jia · Christopher Potts · Adina Williams · Douwe Kiela -
2021 Poster: Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval »
Omar Khattab · Christopher Potts · Matei Zaharia