firstbacksecondback
55 Results
Workshop
|
Retention Score: Quantifying Jailbreak Risks for Vision Language Models ZAITANG LI · Pin-Yu Chen · Tsung-Yi Ho |
||
Workshop
|
Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models Xiaomeng Hu · Pin-Yu Chen · Tsung-Yi Ho |
||
Workshop
|
Zer0-Jack: A memory-efficient gradient-based jailbreaking method for black box Multi-modal Large Language Models Tiejin Chen · Kaishen Wang · Hua Wei |
||
Workshop
|
DeepInception: Hypnotize Large Language Model to Be Jailbreaker Xuan Li · Zhanke Zhou · Jianing Zhu · Jiangchao Yao · Tongliang Liu · Bo Han |
||
Workshop
|
LLM Improvement for Jailbreak Defense: Analysis Through the Lens of Over-Refusal Swetasudha Panda · Naveen Jafer Nizar · Michael Wick |
||
Workshop
|
Testing the Limits of Jailbreaking with the Purple Problem Taeyoun Kim · Suhas Kotha · Aditi Raghunathan |
||
Workshop
|
AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks Yifan Zeng · Yiran Wu · Xiao Zhang · Huazheng Wang · Qingyun Wu |