Biological Pathway Informed Models with Graph Attention Networks (GATs)
Abstract
Biological pathways map gene–gene interactions that govern all human processes. Despite their importance, most ML models treat genes as unstructured tokens, discarding known topology. Recent pathway-informed methods add pathway-pathway priors, but still use set-MLP pooling within a pathway, losing intra-pathway structure. We introduce a relation-aware GAT that performs masked, multi-relation (e.g. activating/inhibiting) message passing over curated pathway adjacencies to generate pathway embeddings from gene-level data. We show that GATs generalize much better than MLPs, achieving an 81% reduction in MSE when predicting pathway dynamics under unseen treatment conditions. We further validate the correctness of our biological prior by encoding drug mechanisms via edge interventions, boosting model robustness. Finally, trained on no prior, the GAT model correctly rediscovers all 5 signed gene-gene interactions in the canonical TP53-MDM2-MDM4 feedback loop from raw time-series mRNA data, demonstrating potential to generate novel biological hypotheses. [All code will be released upon publication.]