Timezone: »
Constructing benchmarks that test the abilities of modern natural language understanding models is difficult - pre-trained language models exploit artifacts in benchmarks to achieve human parity, but still fail on adversarial examples and make errors that demonstrate a lack of common sense. In this work, we propose gamification as a framework for data construction. The goal of players in the game is to compose questions that mislead a rival AI while using specific phrases for extra points. The game environment leads to enhanced user engagement and simultaneously gives the game designer control over the collected data, allowing us to collect high-quality data at scale. Using our method we create CommonsenseQA 2.0, which includes 14,343 yes/no questions, and demonstrate its difficulty for models that are orders-of-magnitude larger than the AI used in the game itself.Our best baseline, the T5-based Unicorn with 11B parameters achieves an accuracy of 70.2%, substantially higher than GPT-3 (52.9%) in a few-shot inference setup. Both score well below human performance which is at 94.1%.
Author Information
Alon Talmor (Tel Aviv University)
Ori Yoran (Tel Aviv University)
Ronan Le Bras (Allen Institute for AI)
Chandra Bhagavatula (Allen Institute for Artificial Intelligence)
Yoav Goldberg (Bar-Ilan University)
Yejin Choi (University of Washington)
Jonathan Berant (Tel Aviv University)
More from the Same Authors
-
2021 : CommonsenseQA 2.0: Exposing the Limits of AI through Gamification »
Alon Talmor · Ori Yoran · Ronan Le Bras · Chandra Bhagavatula · Yoav Goldberg · Yejin Choi · Jonathan Berant -
2021 : NaturalProofs: Mathematical Theorem Proving in Natural Language »
Sean Welleck · Jiacheng Liu · Ronan Le Bras · Hanna Hajishirzi · Yejin Choi · Kyunghyun Cho -
2021 : Towards Grounded Natural Language Proof Generation »
Sean Welleck · Jiacheng Liu · Yejin Choi -
2022 : Information-Theoretic Evaluation of Free-Text Rationales with Conditional $\mathcal{V}$-Information »
Hanjie Chen · Faeze Brahman · Xiang Ren · Yangfeng Ji · Yejin Choi · Swabha Swayamdipta -
2023 Poster: Localized Symbolic Knowledge Distillation for Visual Commonsense Models »
Jae Sung Park · Jack Hessel · Khyathi Chandu · Paul Pu Liang · Ximing Lu · Qiuyuan Huang · Peter West · Jianfeng Gao · Ali Farhadi · Yejin Choi -
2023 Poster: SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks »
Bill Yuchen Lin · Yicheng Fu · Karina Yang · Prithviraj (Raj) Ammanabrolu · Faeze Brahman · Shiyu Huang · Chandra Bhagavatula · Yejin Choi · Xiang Ren -
2023 Poster: Faith and Fate: Limits of Transformers on Compositionality »
Nouha Dziri · Ximing Lu · Melanie Sclar · Xiang (Lorraine) Li · Liwei Jiang · Bill Yuchen Lin · Sean Welleck · Peter West · Chandra Bhagavatula · Ronan Le Bras · Jena Hwang · Soumya Sanyal · Xiang Ren · Allyson Ettinger · Zaid Harchaoui · Yejin Choi -
2023 Poster: Syntactic Binding in Diffusion Models: Enhancing Attribute Correspondence through Attention Map Alignment »
Royi Rassin · Eran Hirsch · Daniel Glickman · Shauli Ravfogel · Yoav Goldberg · Gal Chechik -
2023 Poster: From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces »
Peter Shaw · Mandar Joshi · James Cohan · Jonathan Berant · Panupong Pasupat · Hexiang Hu · Urvashi Khandelwal · Kenton Lee · Kristina N Toutanova -
2023 Poster: Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text »
Wanrong Zhu · Jack Hessel · Anas Awadalla · Samir Yitzhak Gadre · Jesse Dodge · Alex Fang · Youngjae Yu · Ludwig Schmidt · William Yang Wang · Yejin Choi -
2023 Poster: RealTime QA: What's the Answer Right Now? »
Jungo Kasai · Keisuke Sakaguchi · yoichi takahashi · Ronan Le Bras · Akari Asai · Xinyan Yu · Dragomir Radev · Noah Smith · Yejin Choi · Kentaro Inui -
2023 Oral: Syntactic Binding in Diffusion Models: Enhancing Attribute Correspondence through Attention Map Alignment »
Royi Rassin · Eran Hirsch · Daniel Glickman · Shauli Ravfogel · Yoav Goldberg · Gal Chechik -
2023 Workshop: AI meets Moral Philosophy and Moral Psychology: An Interdisciplinary Dialogue about Computational Ethics »
Sydney Levine · Liwei Jiang · Jared Moore · Zhijing Jin · Yejin Choi -
2022 Spotlight: Lightning Talks 2A-4 »
Sarthak Mittal · Richard Grumitt · Zuoyu Yan · Lihao Wang · Dongsheng Wang · Alexander Korotin · Jiangxin Sun · Ankit Gupta · Vage Egiazarian · Tengfei Ma · Yi Zhou · Yishi Xu · Albert Gu · Biwei Dai · Chunyu Wang · Yoshua Bengio · Uros Seljak · Miaoge Li · Guillaume Lajoie · Yiqun Wang · Liangcai Gao · Lingxiao Li · Jonathan Berant · Huang Hu · Xiaoqing Zheng · Zhibin Duan · Hanjiang Lai · Evgeny Burnaev · Zhi Tang · Zhi Jin · Xuanjing Huang · Chaojie Wang · Yusu Wang · Jian-Fang Hu · Bo Chen · Chao Chen · Hao Zhou · Mingyuan Zhou -
2022 Spotlight: Diagonal State Spaces are as Effective as Structured State Spaces »
Ankit Gupta · Albert Gu · Jonathan Berant -
2022 Poster: COLD Decoding: Energy-based Constrained Text Generation with Langevin Dynamics »
Lianhui Qin · Sean Welleck · Daniel Khashabi · Yejin Choi -
2022 Poster: Diagonal State Spaces are as Effective as Structured State Spaces »
Ankit Gupta · Albert Gu · Jonathan Berant -
2022 Poster: QUARK: Controllable Text Generation with Reinforced Unlearning »
Ximing Lu · Sean Welleck · Jack Hessel · Liwei Jiang · Lianhui Qin · Peter West · Prithviraj Ammanabrolu · Yejin Choi -
2022 Poster: NaturalProver: Grounded Mathematical Proof Generation with Language Models »
Sean Welleck · Jiacheng Liu · Ximing Lu · Hannaneh Hajishirzi · Yejin Choi -
2021 : Panel Discussion »
Pascal Poupart · Ali Ghodsi · Luke Zettlemoyer · Sameer Singh · Kevin Duh · Yejin Choi · Lu Hou -
2021 : Battling with Larger Models through Grounding and Searching »
Yejin Choi -
2021 Oral: MERLOT: Multimodal Neural Script Knowledge Models »
Rowan Zellers · Ximing Lu · Jack Hessel · Youngjae Yu · Jae Sung Park · Jize Cao · Ali Farhadi · Yejin Choi -
2021 : NaturalProofs: Mathematical Theorem Proving in Natural Language »
Sean Welleck · Jiacheng Liu · Ronan Le Bras · Hanna Hajishirzi · Yejin Choi · Kyunghyun Cho -
2021 Poster: Divergence Frontiers for Generative Models: Sample Complexity, Quantization Effects, and Frontier Integrals »
Lang Liu · Krishna Pillutla · Sean Welleck · Sewoong Oh · Yejin Choi · Zaid Harchaoui -
2021 Poster: MERLOT: Multimodal Neural Script Knowledge Models »
Rowan Zellers · Ximing Lu · Jack Hessel · Youngjae Yu · Jae Sung Park · Jize Cao · Ali Farhadi · Yejin Choi -
2021 Poster: MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers »
Krishna Pillutla · Swabha Swayamdipta · Rowan Zellers · John Thickstun · Sean Welleck · Yejin Choi · Zaid Harchaoui -
2021 Oral: MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers »
Krishna Pillutla · Swabha Swayamdipta · Rowan Zellers · John Thickstun · Sean Welleck · Yejin Choi · Zaid Harchaoui -
2020 : Panel Discussion & Closing »
Yejin Choi · Alexei Efros · Chelsea Finn · Kristen Grauman · Quoc V Le · Yann LeCun · Ruslan Salakhutdinov · Eric Xing -
2020 : QA: Yejin Choi »
Yejin Choi -
2020 : Invited Talk: Yejin Choi »
Yejin Choi -
2020 : Adversarial, Socially Aware, and Commonsensical Data »
Yejin Choi -
2020 Poster: Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge »
Alon Talmor · Oyvind Tafjord · Peter Clark · Yoav Goldberg · Jonathan Berant -
2020 Spotlight: Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge »
Alon Talmor · Oyvind Tafjord · Peter Clark · Yoav Goldberg · Jonathan Berant -
2019 : Invited Talk (Yejin Choi) »
Yejin Choi -
2019 : Yejin Choi »
Yejin Choi -
2019 Poster: Defending Against Neural Fake News »
Rowan Zellers · Ari Holtzman · Hannah Rashkin · Yonatan Bisk · Ali Farhadi · Franziska Roesner · Yejin Choi -
2019 Poster: A Little Is Enough: Circumventing Defenses For Distributed Learning »
Moran Baruch · Gilad Baruch · Yoav Goldberg -
2018 Poster: Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction »
Roei Herzig · Moshiko Raboh · Gal Chechik · Jonathan Berant · Amir Globerson -
2018 Poster: Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing »
Chen Liang · Mohammad Norouzi · Jonathan Berant · Quoc V Le · Ni Lao -
2018 Spotlight: Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing »
Chen Liang · Mohammad Norouzi · Jonathan Berant · Quoc V Le · Ni Lao -
2017 Poster: On-the-fly Operation Batching in Dynamic Computation Graphs »
Graham Neubig · Yoav Goldberg · Chris Dyer -
2014 Poster: Neural Word Embedding as Implicit Matrix Factorization »
Omer Levy · Yoav Goldberg