firstbacksecondback
20 Results
Workshop
|
Sat 14:30 |
Open Problems in Universality: A mechanistic interpretability perspective Neel Nanda |
|
Poster
|
Fri 11:00 |
InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques Rohan Gupta · Iván Arcuschin Moreno · Thomas Kwa · Adrià Garriga-Alonso |
|
Workshop
|
The Multi-faceted Monosemanticity in Multimodal Representations Hanqi Yan · Yulan He · Yifei Wang |
||
Workshop
|
Constrained Belief Updating and Geometric Structures in Transformer Representations Mateusz Piotrowski · Paul Riechers · Daniel Filan · Adam Shai |
||
Poster
|
Wed 16:30 |
Iteration Head: A Mechanistic Study of Chain-of-Thought Vivien Cabannes · Charles Arnal · Wassim Bouaziz · Xingyu Yang · Francois Charton · Julia Kempe |
|
Poster
|
Thu 11:00 |
Compact Proofs of Model Performance via Mechanistic Interpretability Jason Gross · Rajashree Agrawal · Thomas Kwa · Euan Ong · Chun Hei Yip · Alex Gibson · Soufiane Noubir · Lawrence Chan |
|
Workshop
|
How Transformers Reason: A Case Study on a Synthetic Propositional Logic Problem Guan Zhe Hong · Nishanth Dikkala · Enming Luo · Cyrus Rashtchian · Xin Wang · Rina Panigrahy |
||
Workshop
|
Ablation is Not Enough to Emulate DPO: A Mechanistic Analysis of Toxicity Reduction Yushi Yang · Filip Sondej · Harry Mayne · Adam Mahdi |
||
Workshop
|
SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models Carter Teplica · Yixin Liu · Arman Cohan · Tim G. J. Rudner |
||
Workshop
|
Sat 15:45 |
SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models Carter Teplica · Yixin Liu · Arman Cohan · Tim G. J. Rudner |
|
Workshop
|
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs Tianyu Guo · Druv Pai · Yu Bai · Jiantao Jiao · Michael Jordan · Song Mei |
||
Workshop
|
Sat 15:30 |
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs Tianyu Guo · Druv Pai · Yu Bai · Jiantao Jiao · Michael Jordan · Song Mei |