Interpreting GFlowNets for Drug Discovery: Extracting Actionable Insights for Medicinal Chemistry
Abstract
Deep-generative models are increasingly applied to molecular design in drug discovery, where they explore vast chemical spaces while respecting synthesizability constraints. SynFlowNet [1], a hierarchical Generative Flow Network (GFlowNet), addresses this challenge by constructing molecules through sequential reaction templates and building blocks. Yet, the internal representations and decision policies of such models remain opaque, limiting interpretability and trust. From an ML perspective, hierarchical GFlowNets are an emerging class of structured generative models, and their interpretability remains largely unexplored. Bridging this gap advances transparency in generative ML while creating methods that extend beyond the chemistry domain. We introduce a unified interpretability framework for reaction graph GFlowNets that adapts modern ML analysis tools to scientific generative models. Our approach integrates two complementary perspectives. First, gradient-based saliency with counterfactual analysis: we compute gradients of action log-probabilities and map them into atom-level heatmaps. To move beyond correlation, we perturb chemical motifs with SMARTS-based masking and quantify probability shifts, yielding both attribution maps and causal evidence for which substructures drive decisions. Second, concept attribution in latent space: we train sparse autoencoders (SAEs) and linear probes on SynFlowNet embeddings to uncover interpretable factors and motifs encoded by the model. We find that SynFlowNet, when trained with QED (drug-likeness) as the reward, does not encode it in a single latent dimension. Instead, sparse autoencoders disentangle QED into interpretable axes such as size, polarity, and lipophilicity, which are more linearly predictable than QED itself. Linear probes accurately detect chemically meaningful motifs (e.g., functional groups, rings, halogens), showing that domain concepts are directly recoverable. Counterfactual analyses further improve QED optimization by identifying and altering reward-critical substructures. By combining saliency, counterfactuals, and concept attribution, our framework offers the first toolkit for interpreting GFlowNets in molecular design. This not only demonstrates how interpretability methods from vision and language can be extended to structured generative models in ML, but also provides actionable insights for medicinal chemists, helping bridge model behavior to chemical reasoning and accelerating drug discovery . [1] Miruna Cretu et al., SynFlowNet: Design of Diverse and Novel Molecules with Synthesis Constraints, arXiv preprint arXiv:2024.