Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

11 Results

<<   <   Page 1 of 1   >>   >
Workshop
Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning
Seanie Lee · Minsu Kim · Lynn Cherif · David Dobre · Juho Lee · Sung Ju Hwang · Kenji Kawaguchi · Gauthier Gidel · Yoshua Bengio · Nikolay Malkin · Moksh Jain
Workshop
Safety-Aware Fine-Tuning of Large Language Models
Hyeong Kyu Choi · Xuefeng Du · Sharon Li
Workshop
Representation Tuning
Christopher Ackerman
Workshop
Sun 11:21 Rui Ye, Jingyi Chai, Xiangrui Liu, Yaodong Yang, Yanfeng Wang & Siheng Chen. Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models
Workshop
Sat 15:45 vTune: Verifiable fine-tuning Through Backdooring
Eva Zhang · Akilesh Potti · Micah Goldblum
Workshop
Can Safety Fine-Tuning Be More Principled? Lessons Learned from Cybersecurity
David Williams-King · Linh Le · Adam Oberman · Yoshua Bengio
Poster
Thu 11:00 BackdoorAlign: Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment
Jiongxiao Wang · Jiazhao LI · Yiquan Li · Xiangyu Qi · Junjie Hu · Sharon Li · Patrick McDaniel · Muhao Chen · Bo Li · Chaowei Xiao
Poster
Wed 16:30 What Makes and Breaks Safety Fine-tuning? A Mechanistic Study
Samyak Jain · Ekdeep S Lubana · Kemal Oksuz · Tom Joy · Philip Torr · Amartya Sanyal · Puneet Dokania
Workshop
Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models
Rui Ye · Jingyi Chai · Xiangrui Liu · Yaodong Yang · Yanfeng Wang · Siheng Chen
Workshop
Preserving Safety in Fine-Tuned Large Language Models: A Systematic Evaluation and Mitigation Strategy
Tsung-Huan Yang · Ko-Wei Huang · Yung-Hui Li · Lun-Wei Ku
Poster
Fri 16:30 Lisa: Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning Attack
Tiansheng Huang · Sihao Hu · Fatih Ilhan · Selim Tekin · Ling Liu