Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

20 Results

<<   <   Page 2 of 2   >>   >
Workshop
Ablation is Not Enough to Emulate DPO: A Mechanistic Analysis of Toxicity Reduction
Yushi Yang · Filip Sondej · Harry Mayne · Adam Mahdi
Workshop
Sun 16:15 Towards Interpretable Scientific Foundation Models: Sparse Autoencoders for Disentangling Dense Embeddings of Scientific Concepts
Charles O&#x27;Neill · Christine Ye · Kartheik Iyer · John Wu
Workshop
Towards Interpretable Scientific Foundation Models: Sparse Autoencoders for Disentangling Dense Embeddings of Scientific Concepts
Charles O&#x27;Neill · Christine Ye · Kartheik Iyer · John Wu
Poster
Wed 16:30 What Makes and Breaks Safety Fine-tuning? A Mechanistic Study
Samyak Jain · Ekdeep S Lubana · Kemal Oksuz · Tom Joy · Philip Torr · Amartya Sanyal · Puneet Dokania
Workshop
Pay Attention to What Matters
Pedro Silva · Fadhel Ayed · Antonio De Domenico · Ali Maatouk
Workshop
Measuring the Reliability of Causal Probing Methods: Tradeoffs, Limitations, and the Plight of Nullifying Interventions
Marc Canby · Adam Davies · Chirag Rastogi · Julia C Hockenmaier
Workshop
Competence-Based Analysis of Language Models
Adam Davies · Jize Jiang · Cheng Xiang Zhai
Workshop
Uncovering Uncertainty in Transformer Inference
Greyson Brothers · Willa Mannering · John Winder · Amber Tien