Skip to yearly menu bar Skip to main content


Towards Best Practices of Activation Patching in Language Models: Metrics and Methods

Fred Zhang · Neel Nanda

Abstract

Video

Chat is not available.