firstbacksecondback
34 Results
Poster
|
Thu 11:00 |
Analysing the Generalisation and Reliability of Steering Vectors Daniel Tan · David Chanin · Aengus Lynch · Brooks Paige · Dimitrios Kanoulas · Adrià Garriga-Alonso · Robert Kirk |
|
Poster
|
Thu 16:30 |
Hypothesis Testing the Circuit Hypothesis in LLMs Claudia Shi · Nicolas Beltran Velez · Achille Nazaret · Carolina Zheng · Adrià Garriga-Alonso · Andrew Jesson · Maggie Makar · David Blei |
|
Poster
|
Thu 11:00 |
CoSy: Evaluating Textual Explanations of Neurons Laura Kopf · Philine L Bommer · Anna Hedström · Sebastian Lapuschkin · Marina Höhne · Kirill Bykov |
|
Poster
|
Fri 16:30 |
Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization James Oldfield · Markos Georgopoulos · Grigorios Chrysos · Christos Tzelepis · Yannis Panagakis · Mihalis Nicolaou · Jiankang Deng · Ioannis Patras |
|
Workshop
|
HarmAnalyst: Interpretable, transparent, and steerable LLM safety moderation Jing-Jing Li · Valentina Pyatkin · Max Kleiman-Weiner · Liwei Jiang · Nouha Dziri · Anne Collins · Jana Schaich Borg · Maarten Sap · Yejin Choi · Sydney Levine |
||
Workshop
|
Quanda: An Interpretability Toolkit for Training Data Attribution Evaluation and Beyond Dilyara Bareeva · Galip Ümit Yolcu · Anna Hedström · Niklas Schmolenski · Thomas Wiegand · Wojciech Samek · Sebastian Lapuschkin |
||
Workshop
|
Sun 8:50 |
Interpretable AI: Past, Present and Future Suraj Srinivas · Michal Moshkovitz · Chhavi Yadav · Lesia Semenova · Nave Frost · Vinayak Abrol · Bitya Neuhof · Valentyn Boreiko · Dotan Di Castro · Himabindu Lakkaraju · Kamalika Chaudhuri |
|
Workshop
|
Interactive Semantic Interventions for VLMs: A Human-in-the-Loop Approach to Interpretability Lukas Klein · Kenza Amara · Carsten Lüth · Hendrik Strobelt · Mennatallah El-Assady · Paul Jaeger |
||
Workshop
|
Sat 15:45 |
Interactive Semantic Interventions for VLMs: A Human-in-the-Loop Approach to Interpretability Lukas Klein · Kenza Amara · Carsten Lüth · Hendrik Strobelt · Mennatallah El-Assady · Paul Jaeger |
|
Workshop
|
Interactive Semantic Interventions for VLMs: A Human-in-the-Loop Approach to Interpretability Lukas Klein · Kenza Amara · Carsten Lüth · Antonio Foncubierta-Rodriguez · Hendrik Strobelt · Mennatallah El-Assady · Paul Jaeger |