Skip to yearly menu bar Skip to main content


Poster

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

Liwei Jiang · Kavel Rao · Seungju Han · Allyson Ettinger · Faeze Brahman · Sachin Kumar · Niloofar Mireshghallah · Ximing Lu · Maarten Sap · Yejin Choi · Nouha Dziri
2024 Poster

Abstract

Video

Chat is not available.