A pipeline for interpretable neural latent discovery
Abstract
Mechanistic understanding of the brain requires interpretability of large-scale neuronal computations. Many latent variable model approaches excel at decoding, but produce complex, opaque latent spaces. We address this with NLDisco, a pipeline for interpretable neural latent discovery. Motivated by successful applications of sparse dictionary learning in AI mechanistic interpretability, NLDisco encourages hidden layer neurons in sparse encoder-decoder models to learn interpretable representations. A flexible, user-friendly software package supports interpretable latent discovery across recording modalities and experimental paradigms. We validate the pipeline on a synthetic dataset, demonstrating that it recovers ground-truth features and reveals meaningful representations. We conclude by discussing future development and applications, emphasizing the pipeline's potential to facilitate neuroscientific discovery and clearer insights into neural computations.