Explainable AI–Guided Virtual Experiments Reveal How DNA Sequence Context Shapes Gene Regulation
Abstract
Deciphering the cis-regulatory code, the rules by which DNA sequence governs gene regulation, is a central challenge in biology with wide-ranging implications for understanding disease mechanisms and engineering DNA for synthetic biology and therapeutic applications. Deep learning models consistently achieve state-of-the-art performance in predicting regulatory activity from DNA sequence, but their black-box nature limits mechanistic insight. Post hoc interpretability tools have identified important sequence motifs corresponding to transcription factor (TF) binding sites, yet the quantitative contribution of surrounding sequence context remains poorly understood. Here, we treat a high-performing sequence-to-function model as a virtual experimental platform, pairing explainable AI with large-scale in silico motif-context swap experiments to quantify the relative contributions of TF motifs and surrounding sequence context to the model’s predicted enhancer activity. Using attribution maps, we identify and localize motif instances, then systematically transplant identical motif syntax between different sequence contexts and measure changes in predicted activity to estimate each component’s effect. Surprisingly, we find that sequence context plays an outsized role compared to motifs, sometimes accounting for most of the predicted activity. Context effects are most pronounced in housekeeping gene programs, where motifs modestly tune a baseline set by sequence context, whereas developmental programs show stronger motif-driven regulation. Our results motivate a paradigm shift from motif-centric models toward quantitative motif–context frameworks that treat sequence context as an active component of the cis-regulatory code rather than a passive scaffold.