Learning monosemantic features in multitask DNA regulatory sequence models via sparse autoencoder decomposition
Abstract
Deep learning models for regulatory genomics achieve high predictive performance across diverse molecular phenotypes, yet their internal representations remain opaque. Here, we apply sparse autoencoders (SAEs) to decompose learned representations of Borzoi, a state-of-the-art CNN-transformer that predicts genome-wide transcriptional and epigenetic profiles from DNA sequence. Training TopK-SAEs on activations from Borzoi's early convolutional layers, we discover monosemantic regulatory features that correspond to transcription factor (TF) and RNA binding protein (RBP) motifs and transposable element sequences. We identify hundreds of significant position weight matrices that map SAE-discovered features to established TF binding sites through motif discovery using MEME suite against known TF databases. This work demonstrates that SAEs can systematically decompose regulatory genomics models into biologically interpretable components.