Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

73 Results

<<   <   Page 2 of 7   >   >>
Workshop
Sun 9:00 Towards Safe & Trustworthy Agents
Alexander Pan · Kimin Lee · Bo Li · Karthik Narasimhan · Dawn Song · Isabelle Barrass
Workshop
MISR: Measuring Instrumental Self-Reasoning in Frontier Models
Kai Fronsdal · David Lindner
Workshop
Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
Tong Wu · Shujian Zhang · Kaiqiang Song · Silei Xu · Sanqiang Zhao · Ravi Agrawal · Sathish Indurthi · Chong Xiang · Prateek Mittal · Wenxuan Zhou
Workshop
Towards Safe Multilingual Frontier AI
Arturs Kanepajs · Vladimir Ivanov · Richard Moulange
Workshop
HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions
Xuhui Zhou · Hyunwoo Kim · Faeze Brahman · Liwei Jiang · Hao Zhu · Ximing Lu · Frank F. Xu · Bill Yuchen Lin · Niloofar Mireshghallah · Ronan Le Bras · Maarten Sap
Workshop
Evaluating Synthetic Activations composed of SAE Latents in GPT-2
Nora Petrova · Giorgi Giglemiani · Chatrik Mangat · Jett Janiak · Stefan Heimersheim
Workshop
An Adversarial Perspective on Machine Unlearning for AI Safety
Jakub Łucki · Boyi Wei · Yangsibo Huang · Peter Henderson · Florian Tramer · Javier Rando
Workshop
An Adversarial Perspective on Machine Unlearning for AI Safety
Jakub Łucki · Boyi Wei · Yangsibo Huang · Peter Henderson · Florian Tramer · Javier Rando
Poster
Thu 11:00 Explaining RL Decisions with Trajectories': A Reproducibility Study
Karim Abdel Sadek · Matteo Nulli · Joan Velja · Jort Vincenti
Workshop
A Safety-aware Framework for Generative Enzyme Design with Foundation Models
Xiaoyi Fu · Tao Han · Yuan Yao · Song Guo
Poster
Thu 11:00 Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense
Rui Min · Zeyu Qin · Nevin L. Zhang · Li Shen · Minhao Cheng
Workshop
Adversarial Negotiation Dynamics in Generative Language Models
Arinbjörn Kolbeinsson · Benedikt Kolbeinsson