Skip to yearly menu bar Skip to main content


Interpreting and Steering LLMs with Mutual Information-based Explanations on Sparse Autoencoders

Xuansheng Wu · Jiayi Yuan · Wenlin Yao · Xiaoming Zhai · Ninghao Liu

Abstract

Chat is not available.