firstbacksecondback
66 Results
Poster
|
Fri 11:00 |
GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models ZAITANG LI · Pin-Yu Chen · Tsung-Yi Ho |
|
Workshop
|
Robust Feature Learning for Multi-Index Models in High Dimensions Alireza Mousavi-Hosseini · Adel Javanmard · Murat Erdogdu |
||
Workshop
|
Robustness of Practical Perceptual Hashing Algorithms to Hash-Evasion and Hash-Inversion Attacks Jordan Madden · Moxanki Bhavsar · Lhamo Dorje · Xiaohua Li |
||
Workshop
|
Certifying Robustness via Topological Representations Jens Agerberg · Andrea Guidolin · Andrea Martinelli · Pepijn Hoefgeest · David Eklund · Martina Scolamiero |
||
Workshop
|
Sparse patches adversarial attacks via extrapolating point-wise information Yaniv Nemcovsky · Avi Mendelson · Chaim Baskin |
||
Workshop
|
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet Nathaniel Li · Ziwen Han · Ian Steneker · Willow Primack · Riley Goodside · Hugh Zhang · Zifan Wang · Cristina Menghini · Summer Yue |
||
Workshop
|
Sun 11:05 |
Contributed Talk 3: LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet Nathaniel Li · Ziwen Han · Ian Steneker · Willow Primack · Riley Goodside · Hugh Zhang · Zifan Wang · Cristina Menghini · Summer Yue |
|
Workshop
|
Shh, don't say that! Domain Certification in LLMs Cornelius Emde · Preetham Arvind · Alasdair Paren · Maxime Kayser · Thomas Rainforth · Thomas Lukasiewicz · Philip Torr · Adel Bibi |
||
Poster
|
Fri 11:00 |
Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models Yimeng Zhang · Xin Chen · Jinghan Jia · Yihua Zhang · Chongyu Fan · Jiancheng Liu · Mingyi Hong · Ke Ding · Sijia Liu |
|
Workshop
|
Jailbreak Defense in a Narrow Domain: Failures of Existing Methods and Improving Transcript-Based Classifiers Tony Wang · John Hughes · Henry Sleight · Rylan Schaeffer · Rajashree Agrawal · Fazl Barez · Mrinank Sharma · Jesse Mu · Nir Shavit · Ethan Perez |
||
Workshop
|
Jailbreak Defense in a Narrow Domain: Failures of existing methods and Improving Transcript-Based Classifiers Tony Wang · John Hughes · Henry Sleight · Rylan Schaeffer · Rajashree Agrawal · Fazl Barez · Mrinank Sharma · Jesse Mu · Nir Shavit · Ethan Perez |
||
Poster
|
Thu 11:00 |
On the Scalability of Certified Adversarial Robustness with Generated Data Thomas Altstidl · David Dobre · Arthur Kosmala · Bjoern Eskofier · Gauthier Gidel · Leo Schwinn |