Timezone: »

 
Poster
Training language models to follow instructions with human feedback
Long Ouyang · Jeffrey Wu · Xu Jiang · Diogo Almeida · Carroll Wainwright · Pamela Mishkin · Chong Zhang · Sandhini Agarwal · Katarina Slama · Alex Ray · John Schulman · Jacob Hilton · Fraser Kelton · Luke Miller · Maddie Simens · Amanda Askell · Peter Welinder · Paul Christiano · Jan Leike · Ryan Lowe

Wed Nov 30 02:00 PM -- 04:00 PM (PST) @ Hall J #920

Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through a language model API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT. In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent.

Author Information

Long Ouyang (OpenAI)
Jeffrey Wu (OpenAI)
Xu Jiang (OpenAI)
Diogo Almeida (OpenAI)
Carroll Wainwright (Partnership on AI)
Pamela Mishkin (OpenAI)
Chong Zhang (OpenAI)
Sandhini Agarwal (OpenAI)
Katarina Slama
Alex Ray (OpenAI)
John Schulman (OpenAI)
Jacob Hilton (OpenAI)
Fraser Kelton
Luke Miller
Maddie Simens
Amanda Askell
Peter Welinder (OpenAI)
Paul Christiano (OpenAI)
Jan Leike (OpenAI)
Ryan Lowe (OpenAI)

More from the Same Authors

  • 2022 : John Schulman »
    John Schulman
  • 2022 : Discussion on Policy Challenges Associated with Generative Models »
    Irene Solaiman · Russell Wald · Yonadav Shavit · Long Ouyang
  • 2022 : Panel on Technical Challenges Associated with Reliable Human Evaluations of Generative Models »
    Long Ouyang · Tongshuang Wu · Zachary Lipton
  • 2022 Poster: Batch size-invariance for policy optimization »
    Jacob Hilton · Karl Cobbe · John Schulman
  • 2020 : Contributed Talk: Asymmetric self-play for automatic goal discovery in robotic manipulation »
    OpenAI Robotics · Matthias Plappert · Raul Sampedro · Tao Xu · Ilge Akkaya · Vineet Kosaraju · Peter Welinder · Ruben D'Sa · Arthur Petron · Henrique Ponde · Alex Paino · Hyeonwoo Noh Noh · Lilian Weng · Qiming Yuan · Casey Chu · Wojciech Zaremba
  • 2020 Poster: Learning to summarize with human feedback »
    Nisan Stiennon · Long Ouyang · Jeffrey Wu · Daniel Ziegler · Ryan Lowe · Chelsea Voss · Alec Radford · Dario Amodei · Paul Christiano
  • 2020 Poster: Language Models are Few-Shot Learners »
    Tom B Brown · Benjamin Mann · Nick Ryder · Melanie Subbiah · Jared Kaplan · Prafulla Dhariwal · Arvind Neelakantan · Pranav Shyam · Girish Sastry · Amanda Askell · Sandhini Agarwal · Ariel Herbert-Voss · Gretchen M Krueger · Tom Henighan · Rewon Child · Aditya Ramesh · Daniel Ziegler · Jeffrey Wu · Clemens Winter · Chris Hesse · Mark Chen · Eric Sigler · Mateusz Litwin · Scott Gray · Benjamin Chess · Jack Clark · Christopher Berner · Sam McCandlish · Alec Radford · Ilya Sutskever · Dario Amodei
  • 2020 Oral: Language Models are Few-Shot Learners »
    Tom B Brown · Benjamin Mann · Nick Ryder · Melanie Subbiah · Jared Kaplan · Prafulla Dhariwal · Arvind Neelakantan · Pranav Shyam · Girish Sastry · Amanda Askell · Sandhini Agarwal · Ariel Herbert-Voss · Gretchen M Krueger · Tom Henighan · Rewon Child · Aditya Ramesh · Daniel Ziegler · Jeffrey Wu · Clemens Winter · Chris Hesse · Mark Chen · Eric Sigler · Mateusz Litwin · Scott Gray · Benjamin Chess · Jack Clark · Christopher Berner · Sam McCandlish · Alec Radford · Ilya Sutskever · Dario Amodei
  • 2019 : Poster Session »
    Matthia Sabatelli · Adam Stooke · Amir Abdi · Paulo Rauber · Leonard Adolphs · Ian Osband · Hardik Meisheri · Karol Kurach · Johannes Ackermann · Matt Benatan · GUO ZHANG · Chen Tessler · Dinghan Shen · Mikayel Samvelyan · Riashat Islam · Murtaza Dalal · Luke Harries · Andrey Kurenkov · Konrad Żołna · Sudeep Dasari · Kristian Hartikainen · Ofir Nachum · Kimin Lee · Markus Holzleitner · Vu Nguyen · Francis Song · Christopher Grimm · Felipe Leno da Silva · Yuping Luo · Yifan Wu · Alex Lee · Thomas Paine · Wei-Yang Qu · Daniel Graves · Yannis Flet-Berliac · Yunhao Tang · Suraj Nair · Matthew Hausknecht · Akhil Bagaria · Simon Schmitt · Bowen Baker · Paavo Parmas · Benjamin Eysenbach · Lisa Lee · Siyu Lin · Daniel Seita · Abhishek Gupta · Riley Simmons-Edler · Yijie Guo · Kevin Corder · Vikash Kumar · Scott Fujimoto · Adam Lerer · Ignasi Clavera Gilaberte · Nicholas Rhinehart · Ashvin Nair · Ge Yang · Lingxiao Wang · Sungryull Sohn · J. Fernando Hernandez-Garcia · Xian Yeow Lee · Rupesh Srivastava · Khimya Khetarpal · Chenjun Xiao · Luckeciano Carvalho Melo · Rishabh Agarwal · Tianhe Yu · Glen Berseth · Devendra Singh Chaplot · Jie Tang · Anirudh Srinivasan · Tharun Kumar Reddy Medini · Aaron Havens · Misha Laskin · Asier Mujika · Rohan Saphal · Joseph Marino · Alex Ray · Joshua Achiam · Ajay Mandlekar · Zhuang Liu · Danijar Hafner · Zhiwen Tang · Ted Xiao · Michael Walton · Jeff Druce · Ferran Alet · Zhang-Wei Hong · Stephanie Chan · Anusha Nagabandi · Hao Liu · Hao Sun · Ge Liu · Dinesh Jayaraman · John Co-Reyes · Sophia Sanborn
  • 2019 : Poster Session »
    Ahana Ghosh · Javad Shafiee · Akhilan Boopathy · Alex Tamkin · Theodoros Vasiloudis · Vedant Nanda · Ali Baheri · Paul Fieguth · Andrew Bennett · Guanya Shi · Hao Liu · Arushi Jain · Jacob Tyo · Benjie Wang · Boxiao Chen · Carroll Wainwright · Chandramouli Shama Sastry · Chao Tang · Daniel S. Brown · David Inouye · David Venuto · Dhruv Ramani · Dimitrios Diochnos · Divyam Madaan · Dmitrii Krashenikov · Joel Oren · Doyup Lee · Eleanor Quint · elmira amirloo · Matteo Pirotta · Gavin Hartnett · Geoffroy Dubourg-Felonneau · Gokul Swamy · Pin-Yu Chen · Ilija Bogunovic · Jason Carter · Javier Garcia-Barcos · Jeet Mohapatra · Jesse Zhang · Jian Qian · John Martin · Oliver Richter · Federico Zaiter · Tsui-Wei Weng · Karthik Abinav Sankararaman · Kyriakos Polymenakos · Lan Hoang · mahdieh abbasi · Marco Gallieri · Mathieu Seurin · Matteo Papini · Matteo Turchetta · Matthew Sotoudeh · Mehrdad Hosseinzadeh · Nathan Fulton · Masatoshi Uehara · Niranjani Prasad · Oana-Maria Camburu · Patrik Kolaric · Philipp Renz · Prateek Jaiswal · Reazul Hasan Russel · Riashat Islam · Rishabh Agarwal · Alexander Aldrick · Sachin Vernekar · Sahin Lale · Sai Kiran Narayanaswami · Samuel Daulton · Sanjam Garg · Sebastian East · Shun Zhang · Soheil Dsidbari · Justin Goodwin · Victoria Krakovna · Wenhao Luo · Wesley Chung · Yuanyuan Shi · Yuh-Shyang Wang · Hongwei Jin · Ziping Xu
  • 2018 : Coffee Break 1 (Posters) »
    Ananya Kumar · Siyu Huang · Huazhe Xu · Michael Janner · Parth Chadha · Nils Thuerey · Peter Lu · Maria Bauza · Anthony Tompkins · Guanya Shi · Thomas Baumeister · André Ofner · Zhi-Qi Cheng · Yuping Luo · Deepika Bablani · Jeroen Vanbaar · Kartic Subr · Tatiana López-Guevara · Devesh Jha · Fabian Fuchs · Stefano Rosa · Alison Pouplin · Alex Ray · Qi Liu · Eric Crawford
  • 2017 : Learning from Human Feedback »
    Paul Christiano
  • 2017 Poster: #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning »
    Haoran Tang · Rein Houthooft · Davis Foote · Adam Stooke · OpenAI Xi Chen · Yan Duan · John Schulman · Filip DeTurck · Pieter Abbeel
  • 2017 Poster: Hindsight Experience Replay »
    Marcin Andrychowicz · Filip Wolski · Alex Ray · Jonas Schneider · Rachel Fong · Peter Welinder · Bob McGrew · Josh Tobin · OpenAI Pieter Abbeel · Wojciech Zaremba