Processing math: 100%
Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

34 Results

<<   <   Page 2 of 3   >   >>
Workshop
In Search of the Successful Interpolation: On the Role of Sharpness in CLIP Generalization
alireza abdollahpour
Workshop
Interpretable AI in Human-Machine Systems: Insights from Human-in-the-Loop Product Recommendation Engines
Pooria Assadi · NIMA SAFAEI
Workshop
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in LLMs
Aashiq Muhamed · Jake Mendel · Lucius Bushnaq · Mona Diab · Virginia Smith
Poster
Fri 11:00 Interpreting Learned Feedback Patterns in Large Language Models
Luke Marks · Amir Abdullah · Clement Neo · Rauno Arike · David Krueger · Philip Torr · Fazl Barez
Workshop
Towards Interpretable Scientific Foundation Models: Sparse Autoencoders for Disentangling Dense Embeddings of Scientific Concepts
Charles O&#x27;Neill · Christine Ye · Kartheik Iyer · John Wu
Workshop
Sun 16:15 Towards Interpretable Scientific Foundation Models: Sparse Autoencoders for Disentangling Dense Embeddings of Scientific Concepts
Charles O&#x27;Neill · Christine Ye · Kartheik Iyer · John Wu
Workshop
Demo: Harnessing Generative AI for Comprehensive Evaluation of Medical Imaging AI
Yisak Kim · Seunghyun Jang · Soyeon Kim · Kyungmin Jeon · Chang Min Park
Poster
Fri 11:00 InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques
Rohan Gupta · Iván Arcuschin Moreno · Thomas Kwa · Adrià Garriga-Alonso
Affinity Event
“Compassionately”: Increasing Plurality Awareness through Community-powered AI
Hala Sheta · Mohamed Ahmed · Syed Ishtiaque Ahmed
Poster
Wed 16:30 What Makes and Breaks Safety Fine-tuning? A Mechanistic Study
Samyak Jain · Ekdeep S Lubana · Kemal Oksuz · Tom Joy · Philip Torr · Amartya Sanyal · Puneet Dokania
Poster
Wed 16:30 Iteration Head: A Mechanistic Study of Chain-of-Thought
Vivien Cabannes · Charles Arnal · Wassim Bouaziz · Xingyu Yang · Francois Charton · Julia Kempe
Workshop
A Cognitive Framework for Learning Debiased and Interpretable Representations via Debiasing Global Workspace
Jinyung Hong · Eun Som Jeon · Changhoon Kim · Keun Hee Park · Utkarsh Nath · 'YZ' Yezhou Yang · Pavan Turaga · Theodore P. Pavlic