firstbacksecondback
126 Results
Workshop
|
Model Developmental Safety: A Safety-Centric Method and Applications in Vision-Language Models Gang Li · Wendi Yu · Yao Yao · Wei Tong · Yingbin Liang · Qihang Lin · Tianbao Yang |
||
Poster
|
Wed 11:00 |
Improved Generation of Adversarial Examples Against Safety-aligned LLMs Qizhang Li · Yiwen Guo · Wangmeng Zuo · Hao Chen |
|
Workshop
|
Safety-Aware Fine-Tuning of Large Language Models Hyeong Kyu Choi · Xuefeng Du · Sharon Li |
||
Workshop
|
Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs Megh Thakkar · Yash More · Quentin Fournier · Matthew Riemer · Pin-Yu Chen · Amal Zouaq · Payel Das · Sarath Chandar |
||
Workshop
|
Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning Seanie Lee · Minsu Kim · Lynn Cherif · David Dobre · Juho Lee · Sung Ju Hwang · Kenji Kawaguchi · Gauthier Gidel · Yoshua Bengio · Nikolay Malkin · Moksh Jain |
||
Workshop
|
A Safety-aware Framework for Generative Enzyme Design with Foundation Models Xiaoyi Fu · Tao Han · Yuan Yao · Song Guo |
||
Poster
|
Thu 11:00 |
Explaining RL Decisions with Trajectories': A Reproducibility Study Karim Abdel Sadek · Matteo Nulli · Joan Velja · Jort Vincenti |
|
Affinity Event
|
Network Inversion of Convolutional Neural Nets Pirzada Suhail · Amit Sethi |
||
Session
|
Fri 15:30 |
Overflow for Oral Session 6B: Safety, New Data |
|
Poster
|
Fri 16:30 |
T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models Yibo Miao · Yifan Zhu · Lijia Yu · Jun Zhu · Xiao-Shan Gao · Yinpeng Dong |
|
Poster
|
Fri 16:30 |
SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset Juntao Dai · Tianle Chen · Xuyao Wang · Ziran Yang · Taiye Chen · Jiaming Ji · Yaodong Yang |
|
Poster
|
Thu 11:00 |
MedSafetyBench: Evaluating and Improving the Medical Safety of Large Language Models Tessa Han · Aounon Kumar · Chirag Agarwal · Himabindu Lakkaraju |