firstbacksecondback
22 Results
Workshop
|
HarmAnalyst: Interpretable, transparent, and steerable LLM safety moderation Jing-Jing Li · Valentina Pyatkin · Max Kleiman-Weiner · Liwei Jiang · Nouha Dziri · Anne Collins · Jana Schaich Borg · Maarten Sap · Yejin Choi · Sydney Levine |
||
Workshop
|
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet Nathaniel Li · Ziwen Han · Ian Steneker · Willow Primack · Riley Goodside · Hugh Zhang · Zifan Wang · Cristina Menghini · Summer Yue |
||
Workshop
|
Sun 11:05 |
Contributed Talk 3: LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet Nathaniel Li · Ziwen Han · Ian Steneker · Willow Primack · Riley Goodside · Hugh Zhang · Zifan Wang · Cristina Menghini · Summer Yue |
|
Workshop
|
SkewAct: Red Teaming Large Language Models via Activation-Skewed Adversarial Prompt Optimization Hanxi Guo · Siyuan Cheng · Guanhong Tao · Guangyu Shen · Zhuo Zhang · Shengwei An · Kaiyuan Zhang · Xiangyu Zhang |
||
Workshop
|
Dissecting Adversarial Robustness of Multimodal LM Agents Chen Wu · Rishi Shah · Jing Yu Koh · Ruslan Salakhutdinov · Daniel Fried · Aditi Raghunathan |
||
Workshop
|
Dissecting Adversarial Robustness of Multimodal LM Agents Chen Wu · Rishi Shah · Jing Yu Koh · Ruslan Salakhutdinov · Daniel Fried · Aditi Raghunathan |
||
Workshop
|
AEGIS2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails Shaona Ghosh · Prasoon Varshney · Makesh Narsimhan Sreedhar · Aishwarya Padmakumar · Traian Rebedea · Jibin Varghese · Christopher Parisien |
||
Workshop
|
Simulation System Towards Solving Societal-Scale Manipulation Maximilian Puelma Touzel · Sneheel Sarangi · Austin Welch · Gayatri K · Dan Zhao · Zachary Yang · Hao Yu · Tom Gibbs · Ethan Kosak-Hine · Andreea Musulan · Camille Thibault · Reihaneh Rabbany · Jean-François Godbout · Kellin Pelrine |
||
Workshop
|
Decompose, Recompose, and Conquer: Multi-modal LLMs are Vulnerable to Compositional Adversarial Attacks in Multi-Image Queries Julius Broomfield · George Ingebretsen · Reihaneh Iranmanesh · Sara Pieri · Ethan Kosak-Hine · Tom Gibbs · Reihaneh Rabbany · Kellin Pelrine |
||
Workshop
|
Decompose, Recompose, and Conquer: Multi-modal LLMs are Vulnerable to Compositional Adversarial Attacks in Multi-Image Queries Julius Broomfield · George Ingebretsen · Reihaneh Iranmanesh · Sara Pieri · Ethan Kosak-Hine · Tom Gibbs · Reihaneh Rabbany · Kellin Pelrine |