Skip to yearly menu bar Skip to main content


Workshop

Red Teaming GenAI: What Can We Learn from Adversaries?

Valeriia Cherepanova · Bo Li · Niv Cohen · Yifei Wang · Yisen Wang · Avital Shafran · Nil-Jana Akpinar · James Zou

Meeting 301

Sun 15 Dec, 8:15 a.m. PST

The development and proliferation of modern generative AI models has introduced valuable capabilities, but these models and their applications also introduce risks to human safety. How do we identify risks in new systems before they cause harm during deployment? This workshop focuses on red teaming, an emergent adversarial approach to probing model behaviors, and its applications towards making modern generative AI safe for humans.

Live content is unavailable. Log in and register to view live content