Workshop
|
|
In Search of the Successful Interpolation: On the Role of Sharpness in CLIP Generalization
alireza abdollahpour
|
|
Workshop
|
|
Interpretable AI in Human-Machine Systems: Insights from Human-in-the-Loop Product Recommendation Engines
Pooria Assadi · NIMA SAFAEI
|
|
Workshop
|
|
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in LLMs
Aashiq Muhamed · Jake Mendel · Lucius Bushnaq · Mona Diab · Virginia Smith
|
|
Poster
|
Fri 11:00
|
Interpreting Learned Feedback Patterns in Large Language Models
Luke Marks · Amir Abdullah · Clement Neo · Rauno Arike · David Krueger · Philip Torr · Fazl Barez
|
|
Workshop
|
|
Towards Interpretable Scientific Foundation Models: Sparse Autoencoders for Disentangling Dense Embeddings of Scientific Concepts
Charles O'Neill · Christine Ye · Kartheik Iyer · John Wu
|
|
Workshop
|
Sun 16:15
|
Towards Interpretable Scientific Foundation Models: Sparse Autoencoders for Disentangling Dense Embeddings of Scientific Concepts
Charles O'Neill · Christine Ye · Kartheik Iyer · John Wu
|
|
Workshop
|
|
Demo: Harnessing Generative AI for Comprehensive Evaluation of Medical Imaging AI
Yisak Kim · Seunghyun Jang · Soyeon Kim · Kyungmin Jeon · Chang Min Park
|
|
Poster
|
Fri 11:00
|
InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques
Rohan Gupta · Iván Arcuschin Moreno · Thomas Kwa · Adrià Garriga-Alonso
|
|
Affinity Event
|
|
“Compassionately”: Increasing Plurality Awareness through Community-powered AI
Hala Sheta · Mohamed Ahmed · Syed Ishtiaque Ahmed
|
|
Poster
|
Wed 16:30
|
What Makes and Breaks Safety Fine-tuning? A Mechanistic Study
Samyak Jain · Ekdeep S Lubana · Kemal Oksuz · Tom Joy · Philip Torr · Amartya Sanyal · Puneet Dokania
|
|
Poster
|
Wed 16:30
|
Iteration Head: A Mechanistic Study of Chain-of-Thought
Vivien Cabannes · Charles Arnal · Wassim Bouaziz · Xingyu Yang · Francois Charton · Julia Kempe
|
|
Workshop
|
|
A Cognitive Framework for Learning Debiased and Interpretable Representations via Debiasing Global Workspace
Jinyung Hong · Eun Som Jeon · Changhoon Kim · Keun Hee Park · Utkarsh Nath · 'YZ' Yezhou Yang · Pavan Turaga · Theodore P. Pavlic
|
|