NeurIPS Poster SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

Poster

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

Alex Wang · Yada Pruksachatkun · Nikita Nangia · Amanpreet Singh · Julian Michael · Felix Hill · Omer Levy · Samuel Bowman

East Exhibition Hall B, C #100

Keywords: [ Applications ] [ Natural Language Processing ] [ Algorithms -> Multitask and Transfer Learning; Algorithms ] [ Representation Learning; Data, Challenges, Implementations, and So ]

[ Abstract ]

Abstract:

In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks. The GLUE benchmark, introduced a little over one year ago, offers a single-number metric that summarizes progress on a diverse set of such tasks, but performance on the benchmark has recently surpassed the level of non-expert humans, suggesting limited headroom for further research. In this paper we present SuperGLUE, a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, a software toolkit, and a public leaderboard. SuperGLUE is available at https://super.gluebenchmark.com.

Live content is unavailable. Log in and register to view live content