Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

55 Results

<<   <   Page 1 of 5   >   >>
Poster
Wed 11:00 Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses
Xiaosen Zheng · Tianyu Pang · Chao Du · Qian Liu · Jing Jiang · Min Lin
Poster
Fri 11:00 JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Patrick Chao · Edoardo Debenedetti · Alexander Robey · Maksym Andriushchenko · Francesco Croce · Vikash Sehwag · Edgar Dobriban · Nicolas Flammarion · George J. Pappas · Florian Tramer · Hamed Hassani · Eric Wong
Poster
Thu 16:30 A StrongREJECT for Empty Jailbreaks
Alexandra Souly · Qingyuan Lu · Dillon Bowen · Tu Trinh · Elvis Hsieh · Sana Pandey · Pieter Abbeel · Justin Svegliato · Scott Emmons · Olivia Watkins · Sam Toyer
Poster
ColJailBreak: Collaborative Generation and Editing for Jailbreaking Text-to-Image Deep Generation
Yizhuo Ma · Shanmin Pang · Qi Guo · Tianyu Wei · Qing Guo
Poster
Thu 11:00 Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs
Zhao Xu · Fan LIU · Hao Liu
Poster
Thu 16:30 WildGuard: Open One-stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
Seungju Han · Kavel Rao · Allyson Ettinger · Liwei Jiang · Bill Yuchen Lin · Nathan Lambert · Yejin Choi · Nouha Dziri
Poster
Thu 11:00 Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
Jingtong Su · Julia Kempe · Karen Ullrich
Poster
Thu 16:30 Fight Back Against Jailbreaking via Prompt Adversarial Tuning
Yichuan Mo · Yuji Wang · Zeming Wei · Yisen Wang
Poster
Thu 11:00 BackdoorAlign: Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment
Jiongxiao Wang · Jiazhao LI · Yiquan Li · Xiangyu Qi · Junjie Hu · Sharon Li · Patrick McDaniel · Muhao Chen · Bo Li · Chaowei Xiao
Poster
Fri 16:30 Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters
Haibo Jin · Andy Zhou · Joe Menke · Haohan Wang
Poster
Thu 11:00 Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
Anay Mehrotra · Manolis Zampetakis · Paul Kassianik · Blaine Nelson · Hyrum Anderson · Yaron Singer · Amin Karbasi
Poster
Thu 16:30 Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
Andy Zhou · Bo Li · Haohan Wang