Skip to yearly menu bar Skip to main content


Language Models Rate Their Own Actions As Safer

Dipika Khullar ⋅ Jack Hopkins ⋅ Rowan Wang ⋅ Fabien Roger

Abstract

Chat is not available.