Skip to yearly menu bar Skip to main content


Search All 2023 Events
 

142 Results

<<   <   Page 2 of 12   >   >>
Workshop
Neurosymbolic AI Reveals Biases and Limitations in ML-Driven Drug Discovery
Lauren Nicole DeLong · Yojana Gadiya · Jacques Fleuriot · Daniel Domingo-Fernández
Workshop
Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism
Mansi Sakarvadia · Arham Khan · Aswathy Ajith · Daniel Grzenda · Nathaniel Hudson · André Bauer · Kyle Chard · Ian Foster
Workshop
Sparse Autoencoders Find Highly Interpretable Features in Language Models
Hoagy Cunningham · Aidan Ewart · Logan Smith · Robert Huben · Lee Sharkey
Workshop
Sat 12:01 Piecewise Linear Parametrization of Policies: Towards Interpretable Deep Reinforcement Learning
Maxime Wabartha · Joelle Pineau
Workshop
Is This the Subspace You Are Looking for? An Interpretability Illusion for Subspace Activation Patching
Aleksandar Makelov · Georg Lange · Atticus Geiger · Neel Nanda
Workshop
Mining the Diamond Miner: Mechanistic Interpretability on the Video PreTraining Agent
Sonia Joseph · Artem Zholus · Mohammad Reza Samsami · Blake Richards
Workshop
Adversarial Attacks on Neuron Interpretation via Activation Maximization
Alex Fulleringer · Geraldin Nanfack · Jonathan Marty · Michael Eickenberg · Eugene Belilovsky
Workshop
What's your Use Case? A Taxonomy of Causal Evaluations of Post-hoc Interpretability
David Reber · Victor Veitch
Workshop
TOD-Flow: Modeling the Structure of Task-Oriented Dialogues
Sungryull Sohn · Yiwei Lyu · Anthony Liu · Lajanugen Logeswaran · Dong-Ki Kim · Dongsub Shim · Honglak Lee
Workshop
Sat 12:01 Are VideoQA Models Truly Multimodal?
Ishaan Singh Rawal · Shantanu Jaiswal · Basura Fernando · Cheston Tan
Workshop
FlexModel: A Framework for Interpretability of Distributed Large Language Models
Matthew Choi · Muhammad Adil Asif · John Willes · David B. Emerson
Workshop
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
Sam Toyer · Olivia Watkins · Ethan Mendes · Justin Svegliato · Luke Bailey · Tiffany Wang · Isaac Ong · Karim Elmaaroufi · Pieter Abbeel · Trevor Darrell · Alan Ritter · Stuart J Russell