firstbacksecondback
142 Results
Workshop
|
Neurosymbolic AI Reveals Biases and Limitations in ML-Driven Drug Discovery Lauren Nicole DeLong · Yojana Gadiya · Jacques Fleuriot · Daniel Domingo-Fernández |
||
Workshop
|
Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism Mansi Sakarvadia · Arham Khan · Aswathy Ajith · Daniel Grzenda · Nathaniel Hudson · André Bauer · Kyle Chard · Ian Foster |
||
Workshop
|
Sparse Autoencoders Find Highly Interpretable Features in Language Models Hoagy Cunningham · Aidan Ewart · Logan Smith · Robert Huben · Lee Sharkey |
||
Workshop
|
Sat 12:01 |
Piecewise Linear Parametrization of Policies: Towards Interpretable Deep Reinforcement Learning Maxime Wabartha · Joelle Pineau |
|
Workshop
|
Is This the Subspace You Are Looking for? An Interpretability Illusion for Subspace Activation Patching Aleksandar Makelov · Georg Lange · Atticus Geiger · Neel Nanda |
||
Workshop
|
Mining the Diamond Miner: Mechanistic Interpretability on the Video PreTraining Agent Sonia Joseph · Artem Zholus · Mohammad Reza Samsami · Blake Richards |
||
Workshop
|
Adversarial Attacks on Neuron Interpretation via Activation Maximization Alex Fulleringer · Geraldin Nanfack · Jonathan Marty · Michael Eickenberg · Eugene Belilovsky |
||
Workshop
|
What's your Use Case? A Taxonomy of Causal Evaluations of Post-hoc Interpretability David Reber · Victor Veitch |
||
Workshop
|
TOD-Flow: Modeling the Structure of Task-Oriented Dialogues Sungryull Sohn · Yiwei Lyu · Anthony Liu · Lajanugen Logeswaran · Dong-Ki Kim · Dongsub Shim · Honglak Lee |
||
Workshop
|
Sat 12:01 |
Are VideoQA Models Truly Multimodal? Ishaan Singh Rawal · Shantanu Jaiswal · Basura Fernando · Cheston Tan |
|
Workshop
|
FlexModel: A Framework for Interpretability of Distributed Large Language Models Matthew Choi · Muhammad Adil Asif · John Willes · David B. Emerson |
||
Workshop
|
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game Sam Toyer · Olivia Watkins · Ethan Mendes · Justin Svegliato · Luke Bailey · Tiffany Wang · Isaac Ong · Karim Elmaaroufi · Pieter Abbeel · Trevor Darrell · Alan Ritter · Stuart J Russell |