Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

126 Results

<<   <   Page 10 of 11   >   >>
Workshop
Sun 11:20 The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains
Ezra Edelman · Nikolaos Tsilivis · Surbhi Goel · Benjamin Edelman · Eran Malach
Workshop
MISR: Measuring Instrumental Self-Reasoning in Frontier Models
Kai Fronsdal · David Lindner
Workshop
An Adversarial Perspective on Machine Unlearning for AI Safety
Jakub Łucki · Boyi Wei · Yangsibo Huang · Peter Henderson · Florian Tramer · Javier Rando
Workshop
Evaluating Synthetic Activations composed of SAE Latents in GPT-2
Nora Petrova · Giorgi Giglemiani · Chatrik Mangat · Jett Janiak · Stefan Heimersheim
Workshop
Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding
Haneul Yoo · Yongjin Yang · Hwaran Lee
Workshop
Towards Safe Multilingual Frontier AI
Arturs Kanepajs · Vladimir Ivanov · Richard Moulange
Workshop
A Cautionary Tale on the Evaluation of Differentially Private In-Context Learning
Anjun Hu · Jiyang Guan · Philip Torr · Francesco Pinto
Workshop
Plentiful Jailbreaks with String Compositions
Brian Huang
Workshop
Measuring AI Agent Autonomy: Towards a Scalable Approach With Code Inspection
Merlin Stein · Peter Cihon · Gagan Bansal · Sam Manning
Workshop
How Does LLM Compression Affect Weight Exfiltration Attacks?
Davis Brown · Mantas Mazeika
Workshop
Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
Tong Wu · Shujian Zhang · Kaiqiang Song · Silei Xu · Sanqiang Zhao · Ravi Agrawal · Sathish Indurthi · Chong Xiang · Prateek Mittal · Wenxuan Zhou
Poster
SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types
Yutao Mou · Shikun Zhang · Wei Ye