Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

34 Results

<<   <   Page 3 of 3   >>   >
Poster
Thu 11:00 Analysing the Generalisation and Reliability of Steering Vectors
Daniel Tan · David Chanin · Aengus Lynch · Brooks Paige · Dimitrios Kanoulas · Adrià Garriga-Alonso · Robert Kirk
Poster
Thu 16:30 Hypothesis Testing the Circuit Hypothesis in LLMs
Claudia Shi · Nicolas Beltran Velez · Achille Nazaret · Carolina Zheng · Adrià Garriga-Alonso · Andrew Jesson · Maggie Makar · David Blei
Poster
Thu 11:00 CoSy: Evaluating Textual Explanations of Neurons
Laura Kopf · Philine L Bommer · Anna Hedström · Sebastian Lapuschkin · Marina Höhne · Kirill Bykov
Poster
Fri 16:30 Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization
James Oldfield · Markos Georgopoulos · Grigorios Chrysos · Christos Tzelepis · Yannis Panagakis · Mihalis Nicolaou · Jiankang Deng · Ioannis Patras
Workshop
HarmAnalyst: Interpretable, transparent, and steerable LLM safety moderation
Jing-Jing Li · Valentina Pyatkin · Max Kleiman-Weiner · Liwei Jiang · Nouha Dziri · Anne Collins · Jana Schaich Borg · Maarten Sap · Yejin Choi · Sydney Levine
Workshop
Quanda: An Interpretability Toolkit for Training Data Attribution Evaluation and Beyond
Dilyara Bareeva · Galip Ümit Yolcu · Anna Hedström · Niklas Schmolenski · Thomas Wiegand · Wojciech Samek · Sebastian Lapuschkin
Workshop
Sun 8:50 Interpretable AI: Past, Present and Future
Suraj Srinivas · Michal Moshkovitz · Chhavi Yadav · Lesia Semenova · Nave Frost · Vinayak Abrol · Bitya Neuhof · Valentyn Boreiko · Dotan Di Castro · Himabindu Lakkaraju · Kamalika Chaudhuri
Workshop
Interactive Semantic Interventions for VLMs: A Human-in-the-Loop Approach to Interpretability
Lukas Klein · Kenza Amara · Carsten Lüth · Hendrik Strobelt · Mennatallah El-Assady · Paul Jaeger
Workshop
Sat 15:45 Interactive Semantic Interventions for VLMs: A Human-in-the-Loop Approach to Interpretability
Lukas Klein · Kenza Amara · Carsten Lüth · Hendrik Strobelt · Mennatallah El-Assady · Paul Jaeger
Workshop
Interactive Semantic Interventions for VLMs: A Human-in-the-Loop Approach to Interpretability
Lukas Klein · Kenza Amara · Carsten Lüth · Antonio Foncubierta-Rodriguez · Hendrik Strobelt · Mennatallah El-Assady · Paul Jaeger