Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

55 Results

<<   <   Page 3 of 5   >   >>
Workshop
Between the Bars: Gradient-based Jailbreaks are Bugs that induce Features
Kaivalya Hariharan · Uzay Girit
Workshop
Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference
Anton Xue · Avishree Khare · Rajeev Alur · Surbhi Goel · Eric Wong
Workshop
Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs
Aidan Ewart · Abhay Sheshadri · Phillip Guo · Aengus Lynch · Cindy Wu · Vivek Hebbar · Henry Sleight · Asa Cooper Stickland · Ethan Perez · Dylan Hadfield-Menell · Stephen Casper
Workshop
Advancing Adversarial Suffix Transfer Learning on Aligned Large Language Models
Hongfu Liu · Yuxi Xie · Ye Wang · Michael Qizhe Shieh
Workshop
Sat 10:45 When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?
Rylan Schaeffer · Dan Valentine · Luke Bailey · James Chua · Cristobal Eyzaguirre · Zane Durante · Joe Benton · Brando Miranda · Henry Sleight · Tony Wang · John Hughes · Rajashree Agrawal · Mrinank Sharma · Scott Emmons · Sanmi Koyejo · Ethan Perez
Workshop
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
Rylan Schaeffer · Dan Valentine · Luke Bailey · James Chua · Zane Durante · Cristobal Eyzaguirre · Joe Benton · Brando Miranda · Henry Sleight · Tony Wang · John Hughes · Rajashree Agrawal · Mrinank Sharma · Scott Emmons · Sanmi Koyejo · Ethan Perez
Workshop
Decompose, Recompose, and Conquer: Multi-modal LLMs are Vulnerable to Compositional Adversarial Attacks in Multi-Image Queries
Julius Broomfield · George Ingebretsen · Reihaneh Iranmanesh · Sara Pieri · Ethan Kosak-Hine · Tom Gibbs · Reihaneh Rabbany · Kellin Pelrine
Workshop
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI
Ambrish Rawat · Stefan Schoepf · Giulio Zizzo · Giandomenico Cornacchia · Muhammad Zaid Hameed · Kieran Fraser · Erik Miehling · Beat Buesser · Elizabeth Daly · Mark Purcell · Prasanna Sattigeri · Pin-Yu Chen · Kush Varshney
Workshop
Plentiful Jailbreaks with String Compositions
Brian Huang
Workshop
Does Refusal Training in LLMs Generalize to the Past Tense?
Maksym Andriushchenko · Nicolas Flammarion
Workshop
Decompose, Recompose, and Conquer: Multi-modal LLMs are Vulnerable to Compositional Adversarial Attacks in Multi-Image Queries
Julius Broomfield · George Ingebretsen · Reihaneh Iranmanesh · Sara Pieri · Ethan Kosak-Hine · Tom Gibbs · Reihaneh Rabbany · Kellin Pelrine
Workshop
Universal Jailbreak Backdoors in Large Language Model Alignment
Thomas Baumann