Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

55 Results

<<   <   Page 4 of 5   >   >>
Workshop
Sun 16:50 Contributed Talk 6: Infecting LLM Agents via Generalizable Adversarial Attack
Weichen Yu · Kai Hu · Tianyu Pang · Chao Du · Min Lin · Matt Fredrikson
Workshop
Sun 10:55 Contributed Talk 2: Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
Rylan Schaeffer · Dan Valentine · Luke Bailey · James Chua · Zane Durante · Cristobal Eyzaguirre · Joe Benton · Brando Miranda · Henry Sleight · Tony Wang · John Hughes · Rajashree Agrawal · Mrinank Sharma · Scott Emmons · Sanmi Koyejo · Ethan Perez
Workshop
Jailbreak Defense in a Narrow Domain: Failures of Existing Methods and Improving Transcript-Based Classifiers
Tony Wang · John Hughes · Henry Sleight · Rylan Schaeffer · Rajashree Agrawal · Fazl Barez · Mrinank Sharma · Jesse Mu · Nir Shavit · Ethan Perez
Workshop
Jailbreak Defense in a Narrow Domain: Failures of existing methods and Improving Transcript-Based Classifiers
Tony Wang · John Hughes · Henry Sleight · Rylan Schaeffer · Rajashree Agrawal · Fazl Barez · Mrinank Sharma · Jesse Mu · Nir Shavit · Ethan Perez
Workshop
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
Rylan Schaeffer · Dan Valentine · Luke Bailey · James Chua · Zane Durante · Cristobal Eyzaguirre · Joe Benton · Brando Miranda · Henry Sleight · Tony Wang · John Hughes · Rajashree Agrawal · Mrinank Sharma · Scott Emmons · Sanmi Koyejo · Ethan Perez
Workshop
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
Rylan Schaeffer · Dan Valentine · Luke Bailey · James Chua · Zane Durante · Cristobal Eyzaguirre · Joe Benton · Brando Miranda · Henry Sleight · Tony Wang · John Hughes · Rajashree Agrawal · Mrinank Sharma · Scott Emmons · Sanmi Koyejo · Ethan Perez
Workshop
When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?
Rylan Schaeffer · Dan Valentine · Luke Bailey · James Chua · Cristobal Eyzaguirre · Zane Durante · Joe Benton · Brando Miranda · Henry Sleight · Tony Wang · John Hughes · Rajashree Agrawal · Mrinank Sharma · Scott Emmons · Sanmi Koyejo · Ethan Perez
Workshop
A Realistic Threat Model for Large Language Model Jailbreaks
Valentyn Boreiko · Alexander Panfilov · Vaclav Voracek · Matthias Hein · Jonas Geiping
Workshop
Between the Bars: Gradient-based Jailbreaks are Bugs that induce Features
Kaivalya Hariharan · Uzay Girit
Workshop
Jailbreaking Large Language Models with Symbolic Mathematics
Emet Bethany · Mazal Bethany · Juan Nolazco-Flores · Sumit Jha · peyman najafirad
Workshop
Towards Safe Multilingual Frontier AI
Arturs Kanepajs · Vladimir Ivanov · Richard Moulange
Workshop
Plentiful Jailbreaks with String Compositions
Brian Huang