`

Timezone: »

 
Poster
Language Models are Few-Shot Learners
Tom B Brown · Benjamin Mann · Nick Ryder · Melanie Subbiah · Jared D Kaplan · Prafulla Dhariwal · Arvind Neelakantan · Pranav Shyam · Girish Sastry · Amanda Askell · Sandhini Agarwal · Ariel Herbert-Voss · Gretchen M Krueger · Tom Henighan · Rewon Child · Aditya Ramesh · Daniel Ziegler · Jeffrey Wu · Clemens Winter · Chris Hesse · Mark Chen · Eric Sigler · Mateusz Litwin · Scott Gray · Benjamin Chess · Jack Clark · Christopher Berner · Sam McCandlish · Alec Radford · Ilya Sutskever · Dario Amodei

Mon Dec 07 09:00 PM -- 11:00 PM (PST) @ Poster Session 0 #49

We demonstrate that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even becoming competitive with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks. We also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora.

Author Information

Tom B Brown (Google Brain)

Research Engineer @ Google Brain

Ben Mann (OpenAI)
Nick Ryder (OpenAI)
Melanie Subbiah (OpenAI)
Jared D Kaplan (Johns Hopkins University)
Prafulla Dhariwal (OpenAI)
Arvind Neelakantan (OpenAI)
Pranav Shyam (OpenAI)
Girish Sastry (OpenAI)
Amanda Askell (OpenAI)
Sandhini Agarwal (OpenAI)
Ariel Herbert-Voss (OpenAI)
Gretchen M Krueger (OpenAI)
Tom Henighan (OpenAI)
Rewon Child (OpenAI)
Aditya Ramesh (OpenAI)
Daniel Ziegler (OpenAI)

I work at OpenAI on AI alignment: how can we make techniques for learning human values that will scale robustly to superhuman learning systems and task performance?

Jeffrey Wu (OpenAI)
Clemens Winter (OpenAI)
Chris Hesse (OpenAI)
Mark Chen (OpenAI)
Eric Sigler (OpenAI)
Mateusz Litwin (OpenAI)
Scott Gray (OpenAI)
Benjamin Chess (OpenAI)
Jack Clark (OpenAI)
Christopher Berner (OpenAI)
Sam McCandlish (OpenAI)
Alec Radford (OpenAI)
Ilya Sutskever (OpenAI)
Dario Amodei (OpenAI)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors