Sun 8:45 a.m. - 9:00 a.m.
|
Welcome and Opening Remarks
(
Intro
)
>
SlidesLive Video
|
馃敆
|
Sun 9:00 a.m. - 9:45 a.m.
|
Atticus Geiger: The Current State of Interpretability and Ideas for Scaling Up
(
Invited Talk
)
>
SlidesLive Video
|
Atticus Geiger
馃敆
|
Sun 9:45 a.m. - 10:15 a.m.
|
Spotlight Talks
(
Spotlight Talks
)
>
|
馃敆
|
Sun 9:45 a.m. - 9:51 a.m.
|
LoFiT: Localized Fine-tuning on LLM Representations
(
Spotlight Talk
)
>
SlidesLive Video
|
Fangcong Yin 路 Xi Ye 路 Greg Durrett
馃敆
|
Sun 9:51 a.m. - 9:57 a.m.
|
Decomposing and Editing Predictions by Modeling Model Computation
(
Spotlight Talk
)
>
|
Harshay Shah 路 Andrew Ilyas 路 Aleksander Madry
馃敆
|
Sun 9:57 a.m. - 10:03 a.m.
|
Analyzing (In)Abilities of SAEs via Formal Languages
(
Spotlight Talk
)
>
SlidesLive Video
|
Abhinav Menon 路 Manish Shrivastava 路 David Krueger 路 Ekdeep S Lubana
馃敆
|
Sun 10:03 a.m. - 10:09 a.m.
|
Towards Reliable Evaluation of Behavior Steering Interventions in LLMs
(
Spotlight Talk
)
>
SlidesLive Video
|
Itamar Pres 路 Laura Ruis 路 Ekdeep S Lubana 路 David Krueger
馃敆
|
Sun 10:09 a.m. - 10:15 a.m.
|
Probing the Decision Boundaries of In-context Learning in Large Language Models
(
Spotlight Talk
)
>
SlidesLive Video
|
Siyan Zhao
馃敆
|
Sun 10:15 a.m. - 10:45 a.m.
|
Coffee Break
|
馃敆
|
Sun 10:45 a.m. - 11:30 a.m.
|
Fernanda Vi茅gas: AI Dashboard Design: A User-Centered Approach to Interpretability
(
Invited Talk
)
>
SlidesLive Video
|
Fernanda Vi茅gas
馃敆
|
Sun 11:30 a.m. - 12:00 p.m.
|
Junior Panel Discussion
(
Panel Discussion
)
>
SlidesLive Video
|
馃敆
|
Sun 12:00 p.m. - 1:00 p.m.
|
Lunch Break
|
馃敆
|
Sun 1:00 p.m. - 2:00 p.m.
|
Poster Session
(
Poster Session
)
>
|
馃敆
|
Sun 2:00 p.m. - 2:45 p.m.
|
David Ha: The Future of Collective Intelligence and Meta Evolution for Foundation Models
(
Invited Talk
)
>
SlidesLive Video
|
David Ha
馃敆
|
Sun 2:45 p.m. - 3:15 p.m.
|
Coffe Break
|
馃敆
|
Sun 3:15 p.m. - 4:00 p.m.
|
Jacob Steinhardt: Scalably Understanding AI with AI
(
Invited Talk
)
>
SlidesLive Video
|
Jacob Steinhardt
馃敆
|
Sun 4:00 p.m. - 4:55 p.m.
|
Panel Discussion
(
Panel Discussion
)
>
SlidesLive Video
|
Fernanda Vi茅gas 路 Neel Nanda 路 Atticus Geiger 路 Jacob Steinhardt
馃敆
|
Sun 4:55 p.m. - 5:00 p.m.
|
Closing Remarks and Award Ceremony
(
Outro
)
>
SlidesLive Video
|
馃敆
|
-
|
Overcoming Limitations of Steering Vectors with Low-Rank Representation Steering
(
Poster
)
>
link
|
Dmitrii Krasheninnikov 路 David Krueger
馃敆
|
-
|
Do LLMs internally know'' when they follow instructions?
(
Poster
)
>
link
|
Juyeon Heo 路 Christina Heinze-Deml 路 Shirley Ren 路 Oussama Elachqar 路 Udhyakumar Nallasamy 路 Andy Miller 路 Jaya Narain
馃敆
|
-
|
LoFiT: Localized Fine-tuning on LLM Representations
(
Poster
)
>
link
|
Fangcong Yin 路 Xi Ye 路 Greg Durrett
馃敆
|
-
|
Ablation is Not Enough to Emulate DPO: A Mechanistic Analysis of Toxicity Reduction
(
Poster
)
>
link
|
Yushi Yang 路 Filip Sondej 路 Harry Mayne 路 Adam Mahdi
馃敆
|
-
|
Is Free Self-Alignment Possible?
(
Poster
)
>
link
|
Dyah Adila 路 Changho Shin 路 Yijing Zhang 路 Frederic Sala
馃敆
|
-
|
Steering semantic search with interpretable features from sparse autoencoders
(
Poster
)
>
link
|
Christine Ye 路 Charles O'Neill 路 John Wu 路 Kartheik Iyer
馃敆
|
-
|
Zero-to-Hero: Enhancing Zero-Shot Novel View Synthesis via Attention Map Filtering
(
Poster
)
>
link
|
Ido Sobol 路 Chenfeng Xu 路 Or Litany
馃敆
|
-
|
Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering
(
Poster
)
>
link
|
Joris Postmus 路 Steven Abreu
馃敆
|
-
|
Measuring the Reliability of Causal Probing Methods: Tradeoffs, Limitations, and the Plight of Nullifying Interventions
(
Poster
)
>
link
|
Marc Canby 路 Adam Davies 路 Chirag Rastogi 路 Julia C Hockenmaier
馃敆
|
-
|
Uncovering Uncertainty in Transformer Inference
(
Poster
)
>
link
|
Greyson Brothers 路 Willa Mannering 路 John Winder 路 Amber Tien
馃敆
|
-
|
Algorithmic Oversight for Deceptive Reasoning
(
Poster
)
>
link
|
Ege Onur Taga 路 Mingchen Li 路 Yongqi Chen 路 Samet Oymak
馃敆
|
-
|
Probing the Decision Boundaries of In-context Learning in Large Language Models
(
Poster
)
>
link
|
Siyan Zhao 路 Tung Nguyen 路 Aditya Grover
馃敆
|
-
|
Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks
(
Poster
)
>
link
|
Madeline Brumley 路 Joe Kwon 路 David Krueger 路 Dmitrii Krasheninnikov 路 Usman Anwar
馃敆
|
-
|
Linearly Controlled Language Generation with Performative Guarantees
(
Poster
)
>
link
|
Emily Cheng 路 Marco Baroni 路 Carmen Amo Alonso
馃敆
|
-
|
Entropy-Based Decoding for Retrieval-Augmented Large Language Models
(
Poster
)
>
link
|
Zexuan Qiu 路 Zijing Ou 路 Bin Wu 路 Jingjing Li 路 Aiwei Liu 路 Irwin King
馃敆
|
-
|
Toward Explanation Bottleneck Models
(
Poster
)
>
link
|
Shin'ya Yamaguchi 路 Kosuke Nishida
馃敆
|
-
|
Can sparse autoencoders be used to decompose and interpret steering vectors?
(
Poster
)
>
link
|
Harry Mayne 路 Yushi Yang 路 Adam Mahdi
馃敆
|
-
|
WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models
(
Poster
)
>
link
|
Peng Wang 路 Zexi Li 路 Ningyu Zhang 路 Ziwen Xu 路 Yunzhi Yao 路 Yong Jiang 路 Pengjun Xie 路 Fei Huang 路 Huajun Chen
馃敆
|
-
|
Representation Tuning
(
Poster
)
>
link
|
Christopher Ackerman
馃敆
|
-
|
SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models
(
Poster
)
>
link
|
Carter Teplica 路 Yixin Liu 路 Arman Cohan 路 Tim G. J. Rudner
馃敆
|
-
|
Understanding Visual Concepts Across Models
(
Poster
)
>
link
|
Brandon Trabucco 路 Max Gurinas 路 Kyle Doherty 路 Ruslan Salakhutdinov
馃敆
|
-
|
Secret Seeds in Text-to-Image Diffusion Models
(
Poster
)
>
link
|
Katherine Xu 路 Lingzhi Zhang 路 Jianbo Shi
馃敆
|
-
|
Analyzing (In)Abilities of SAEs via Formal Languages
(
Poster
)
>
link
|
Abhinav Menon 路 Manish Shrivastava 路 Ekdeep S Lubana 路 David Krueger
馃敆
|
-
|
Pay Attention to What Matters
(
Poster
)
>
link
|
Pedro Silva 路 Fadhel Ayed 路 Antonio De Domenico 路 Ali Maatouk
馃敆
|
-
|
Decomposing and Editing Predictions by Modeling Model Computation
(
Poster
)
>
link
|
Harshay Shah 路 Andrew Ilyas 路 Aleksander Madry
馃敆
|
-
|
Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models
(
Poster
)
>
link
|
Xinyu Zhou 路 Delong Chen 路 Samuel Cahyawijaya 路 Xufeng Duan 路 Zhenguang Cai
馃敆
|
-
|
Semantic Entropy Neurons: Encoding Semantic Uncertainty in the Latent Space of LLMs
(
Poster
)
>
link
|
Jiatong Han 路 Jannik Kossen 路 Muhammed Razzak 路 Yarin Gal
馃敆
|
-
|
Towards Reliable Evaluation of Behavior Steering Interventions in LLMs
(
Poster
)
>
link
|
Itamar Pres 路 Laura Ruis 路 Ekdeep S Lubana 路 David Krueger
馃敆
|
-
|
Unveiling and Manipulating Concepts in Time Series Foundation Models
(
Poster
)
>
link
|
Michal Wilinski 路 Mononito Goswami 路 Nina 呕ukowska 路 Willa Potosnak 路 Artur Dubrawski
馃敆
|
-
|
GPT-2 Small Fine-Tuned on Logical Reasoning Summarizes Information on Punctuation Tokens
(
Poster
)
>
link
|
Sonakshi Chauhan 路 Atticus Geiger
馃敆
|
-
|
Extracting Paragraphs from LLM Token Activations
(
Poster
)
>
link
|
Nicky Pochinkov 路 Angelo Benoit 路 Lovkush Agarwal 路 Zainab Ali Majid 路 Lucile Ter-Minassian
馃敆
|
-
|
Analysing the Residual Stream of Language Models Under Knowledge Conflicts
(
Poster
)
>
link
|
Yu Zhao 路 Xiaotang Du 路 Giwon Hong 路 Aryo Gema 路 Alessio Devoto 路 Hongru WANG 路 Xuanli He 路 Kam-Fai Wong 路 Pasquale Minervini
馃敆
|
-
|
Dipper: Diversity in Prompts for Producing Large Language Model Ensembles in Reasoning tasks
(
Poster
)
>
link
|
Gregory Kang Ruey Lau 路 Wenyang Hu 路 Liu Diwen 路 Chen Jizhuo 路 See-Kiong Ng 路 Bryan Kian Hsiang Low
馃敆
|