Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

34 Results

<<   <   Page 2 of 3   >   >>
Workshop
Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning
Seanie Lee · Minsu Kim · Lynn Cherif · David Dobre · Juho Lee · Sung Ju Hwang · Kenji Kawaguchi · Gauthier Gidel · Yoshua Bengio · Nikolay Malkin · Moksh Jain
Workshop
Imitation Guided Automated Red Teaming
Sajad Mousavi · Desik Rengarajan · Ashwin Ramesh Babu · Vineet Gundecha · Antonio Guillen-Perez · Ricardo Luna Gutierrez · Avisek Naug · Sahand Ghorbanpour · Soumyendu Sarkar
Workshop
iART - Imitation guided Automated Red Teaming
Sajad Mousavi · Desik Rengarajan · Ashwin Ramesh Babu · Vineet Gundecha · Avisek Naug · Sahand Ghorbanpour · Ricardo Luna Gutierrez · Antonio Guillen-Perez · Paolo Faraboschi · Soumyendu Sarkar
Workshop
A Formal Framework for Assessing and Mitigating Emergent Security Risks in Generative AI Models: Bridging Theory and Dynamic Risk Mitigation
aviral srivastava · Sourav Panda
Workshop
Sun 10:45 Contributed Talk 1: iART - Imitation guided Automated Red Teaming
Sajad Mousavi · Desik Rengarajan · Ashwin Ramesh Babu · Vineet Gundecha · Avisek Naug · Sahand Ghorbanpour · Ricardo Luna Gutierrez · Antonio Guillen-Perez · Paolo Faraboschi · Soumyendu Sarkar
Workshop
Sun 9:00 Red Teaming GenAI: What Can We Learn from Adversaries?
Valeriia Cherepanova · Bo Li · Niv Cohen · Yifei Wang · Yisen Wang · Avital Shafran · Nil-Jana Akpinar · James Zou
Workshop
Decompose, Recompose, and Conquer: Multi-modal LLMs are Vulnerable to Compositional Adversarial Attacks in Multi-Image Queries
Julius Broomfield · George Ingebretsen · Reihaneh Iranmanesh · Sara Pieri · Ethan Kosak-Hine · Tom Gibbs · Reihaneh Rabbany · Kellin Pelrine
Workshop
Curiosity-driven Red teaming for Large Language Models
Zhang-Wei Hong · Idan Shenfeld · Tsun-Hsuan Johnson Wang · Yung-Sung Chuang · Aldo Pareja · Jim Glass · Akash Srivastava · Pulkit Agrawal
Workshop
Red Teaming Language-Conditioned Robot Models via Vision Language Models
Sathwik Karnik · Zhang-Wei Hong · NISHANT ABHANGI · Yen-Chen Lin · Tsun-Hsuan Johnson Wang · Pulkit Agrawal
Workshop
Red Teaming: Everything Everywhere All at Once
Alexandra Chouldechova · A. Feder Cooper · Abhinav Palia · Dan Vann · Chad Atalla · Hannah Washington · Emily Sheng · Hanna Wallach
Workshop
Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning
Alex Beutel · Kai Xiao · Johannes Heidecke · Lilian Weng
Workshop
Imitation Guided Automated Red Teaming
Sajad Mousavi · Desik Rengarajan · Ashwin Ramesh Babu · Vineet Gundecha · Soumyendu Sarkar