firstbacksecondback
34 Results
Workshop
|
Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning Seanie Lee · Minsu Kim · Lynn Cherif · David Dobre · Juho Lee · Sung Ju Hwang · Kenji Kawaguchi · Gauthier Gidel · Yoshua Bengio · Nikolay Malkin · Moksh Jain |
||
Workshop
|
Imitation Guided Automated Red Teaming Sajad Mousavi · Desik Rengarajan · Ashwin Ramesh Babu · Vineet Gundecha · Antonio Guillen-Perez · Ricardo Luna Gutierrez · Avisek Naug · Sahand Ghorbanpour · Soumyendu Sarkar |
||
Workshop
|
iART - Imitation guided Automated Red Teaming Sajad Mousavi · Desik Rengarajan · Ashwin Ramesh Babu · Vineet Gundecha · Avisek Naug · Sahand Ghorbanpour · Ricardo Luna Gutierrez · Antonio Guillen-Perez · Paolo Faraboschi · Soumyendu Sarkar |
||
Workshop
|
A Formal Framework for Assessing and Mitigating Emergent Security Risks in Generative AI Models: Bridging Theory and Dynamic Risk Mitigation aviral srivastava · Sourav Panda |
||
Workshop
|
Sun 10:45 |
Contributed Talk 1: iART - Imitation guided Automated Red Teaming Sajad Mousavi · Desik Rengarajan · Ashwin Ramesh Babu · Vineet Gundecha · Avisek Naug · Sahand Ghorbanpour · Ricardo Luna Gutierrez · Antonio Guillen-Perez · Paolo Faraboschi · Soumyendu Sarkar |
|
Workshop
|
Sun 9:00 |
Red Teaming GenAI: What Can We Learn from Adversaries? Valeriia Cherepanova · Bo Li · Niv Cohen · Yifei Wang · Yisen Wang · Avital Shafran · Nil-Jana Akpinar · James Zou |
|
Workshop
|
Decompose, Recompose, and Conquer: Multi-modal LLMs are Vulnerable to Compositional Adversarial Attacks in Multi-Image Queries Julius Broomfield · George Ingebretsen · Reihaneh Iranmanesh · Sara Pieri · Ethan Kosak-Hine · Tom Gibbs · Reihaneh Rabbany · Kellin Pelrine |
||
Workshop
|
Curiosity-driven Red teaming for Large Language Models Zhang-Wei Hong · Idan Shenfeld · Tsun-Hsuan Johnson Wang · Yung-Sung Chuang · Aldo Pareja · Jim Glass · Akash Srivastava · Pulkit Agrawal |
||
Workshop
|
Red Teaming Language-Conditioned Robot Models via Vision Language Models Sathwik Karnik · Zhang-Wei Hong · NISHANT ABHANGI · Yen-Chen Lin · Tsun-Hsuan Johnson Wang · Pulkit Agrawal |
||
Workshop
|
Red Teaming: Everything Everywhere All at Once Alexandra Chouldechova · A. Feder Cooper · Abhinav Palia · Dan Vann · Chad Atalla · Hannah Washington · Emily Sheng · Hanna Wallach |
||
Workshop
|
Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning Alex Beutel · Kai Xiao · Johannes Heidecke · Lilian Weng |
||
Workshop
|
Imitation Guided Automated Red Teaming Sajad Mousavi · Desik Rengarajan · Ashwin Ramesh Babu · Vineet Gundecha · Soumyendu Sarkar |