Investigating Protein-DNA Binding Energetic of Mismatched DNA
Abstract
Transcription Factors (TFs) bind to regulatory DNA regions, modulating gene expression. Although various high-throughput techniques have been used to characterize protein binding preferences, this work is the first to extend these studies to non-canonical mismatched bases. The mutagenesis study here presented allows us to determine the binding profile in the double-stranded DNA sequence. Additionally, we leverage deep learning to complete the pairwise interactions map. In this context, we introduce ShapPWM, a motif strategy that marginalizes individual nucleotide contribution by computing the Shapley values. Our model reveals that high synergistic interactions appear between nucleotides in the flanking regions of the contacts. This information offers valuable insights into the binding mechanism and reaction energy, without the necessity of solving intricate crystal structures.