firstbacksecondback
55 Results
Workshop
|
Sun 16:50 |
Contributed Talk 6: Infecting LLM Agents via Generalizable Adversarial Attack Weichen Yu · Kai Hu · Tianyu Pang · Chao Du · Min Lin · Matt Fredrikson |
|
Workshop
|
Sun 10:55 |
Contributed Talk 2: Failures to Find Transferable Image Jailbreaks Between Vision-Language Models Rylan Schaeffer · Dan Valentine · Luke Bailey · James Chua · Zane Durante · Cristobal Eyzaguirre · Joe Benton · Brando Miranda · Henry Sleight · Tony Wang · John Hughes · Rajashree Agrawal · Mrinank Sharma · Scott Emmons · Sanmi Koyejo · Ethan Perez |
|
Workshop
|
Jailbreak Defense in a Narrow Domain: Failures of Existing Methods and Improving Transcript-Based Classifiers Tony Wang · John Hughes · Henry Sleight · Rylan Schaeffer · Rajashree Agrawal · Fazl Barez · Mrinank Sharma · Jesse Mu · Nir Shavit · Ethan Perez |
||
Workshop
|
Jailbreak Defense in a Narrow Domain: Failures of existing methods and Improving Transcript-Based Classifiers Tony Wang · John Hughes · Henry Sleight · Rylan Schaeffer · Rajashree Agrawal · Fazl Barez · Mrinank Sharma · Jesse Mu · Nir Shavit · Ethan Perez |
||
Workshop
|
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models Rylan Schaeffer · Dan Valentine · Luke Bailey · James Chua · Zane Durante · Cristobal Eyzaguirre · Joe Benton · Brando Miranda · Henry Sleight · Tony Wang · John Hughes · Rajashree Agrawal · Mrinank Sharma · Scott Emmons · Sanmi Koyejo · Ethan Perez |
||
Workshop
|
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models Rylan Schaeffer · Dan Valentine · Luke Bailey · James Chua · Zane Durante · Cristobal Eyzaguirre · Joe Benton · Brando Miranda · Henry Sleight · Tony Wang · John Hughes · Rajashree Agrawal · Mrinank Sharma · Scott Emmons · Sanmi Koyejo · Ethan Perez |
||
Workshop
|
When Do Universal Image Jailbreaks Transfer Between Vision-Language Models? Rylan Schaeffer · Dan Valentine · Luke Bailey · James Chua · Cristobal Eyzaguirre · Zane Durante · Joe Benton · Brando Miranda · Henry Sleight · Tony Wang · John Hughes · Rajashree Agrawal · Mrinank Sharma · Scott Emmons · Sanmi Koyejo · Ethan Perez |
||
Workshop
|
A Realistic Threat Model for Large Language Model Jailbreaks Valentyn Boreiko · Alexander Panfilov · Vaclav Voracek · Matthias Hein · Jonas Geiping |
||
Workshop
|
Between the Bars: Gradient-based Jailbreaks are Bugs that induce Features Kaivalya Hariharan · Uzay Girit |
||
Workshop
|
Jailbreaking Large Language Models with Symbolic Mathematics Emet Bethany · Mazal Bethany · Juan Nolazco-Flores · Sumit Jha · peyman najafirad |
||
Workshop
|
Towards Safe Multilingual Frontier AI Arturs Kanepajs · Vladimir Ivanov · Richard Moulange |
||
Workshop
|
Plentiful Jailbreaks with String Compositions Brian Huang |