Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

49 Results

<<   <   Page 4 of 5   >   >>
Poster
Fri 16:30 Rule Based Rewards for Language Model Safety
Tong Mu · Alec Helyar · Johannes Heidecke · Joshua Achiam · Andrea Vallone · Ian Kivlichan · Molly Lin · Alex Beutel · John Schulman · Lilian Weng
Poster
Fri 16:30 One-Shot Safety Alignment for Large Language Models via Optimal Dualization
Xinmeng Huang · Shuo Li · Edgar Dobriban · Osbert Bastani · Hamed Hassani · Dongsheng Ding
Workshop
Cascaded to End-to-End: New Safety, Security, and Evaluation Questions for Audio Language Models
Luxi He · Xiangyu Qi · Inyoung Cheong · Prateek Mittal · Danqi Chen · Peter Henderson
Workshop
Sun 10:55 Contributed Talk 2: Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
Rylan Schaeffer · Dan Valentine · Luke Bailey · James Chua · Zane Durante · Cristobal Eyzaguirre · Joe Benton · Brando Miranda · Henry Sleight · Tony Wang · John Hughes · Rajashree Agrawal · Mrinank Sharma · Scott Emmons · Sanmi Koyejo · Ethan Perez
Workshop
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
Rylan Schaeffer · Dan Valentine · Luke Bailey · James Chua · Zane Durante · Cristobal Eyzaguirre · Joe Benton · Brando Miranda · Henry Sleight · Tony Wang · John Hughes · Rajashree Agrawal · Mrinank Sharma · Scott Emmons · Sanmi Koyejo · Ethan Perez
Workshop
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
Rylan Schaeffer · Dan Valentine · Luke Bailey · James Chua · Zane Durante · Cristobal Eyzaguirre · Joe Benton · Brando Miranda · Henry Sleight · Tony Wang · John Hughes · Rajashree Agrawal · Mrinank Sharma · Scott Emmons · Sanmi Koyejo · Ethan Perez
Workshop
Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage
Rafi Rashid · Jing Liu · Toshiaki Koike-Akino · Shagufta Mehnaz · Ye Wang
Workshop
A Formal Framework for Assessing and Mitigating Emergent Security Risks in Generative AI Models: Bridging Theory and Dynamic Risk Mitigation
aviral srivastava · Sourav Panda
Workshop
Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning
Seanie Lee · Minsu Kim · Lynn Cherif · David Dobre · Juho Lee · Sung Ju Hwang · Kenji Kawaguchi · Gauthier Gidel · Yoshua Bengio · Nikolay Malkin · Moksh Jain
Poster
Fri 11:00 Safe LoRA: The Silver Lining of Reducing Safety Risks when Finetuning Large Language Models
Chia-Yi Hsu · Yu-Lin Tsai · Chih-Hsun Lin · Pin-Yu Chen · Chia-Mu Yu · Chun-Ying Huang
Workshop
Decompose, Recompose, and Conquer: Multi-modal LLMs are Vulnerable to Compositional Adversarial Attacks in Multi-Image Queries
Julius Broomfield · George Ingebretsen · Reihaneh Iranmanesh · Sara Pieri · Ethan Kosak-Hine · Tom Gibbs · Reihaneh Rabbany · Kellin Pelrine
Workshop
Mitigating Downstream Model Risks via Model Provenance
Keyu Wang · Scott Schaffter · Abdullah Norozi Iranzad · Doina Precup · Jonathan Lebensold · Megan Risdal