Skip to yearly menu bar Skip to main content

Workshop: AI for Science: from Theory to Practice

Automated distillation of genomic equations governing single cell gene expression

Edouardo Honig · Frederique Ruf Zamojski · Stuart Sealfon · Ying Nian Wu · Zijun Frank Zhang


Gene expression is an essential cellular process that is controlled by a complex and orchestrated regulatory network of transcription factors and epigenetic modifications.The advancement in single-cell RNA sequencing enables the investigation of gene expression control at an unprecedented fine resolution and large scale. Yet, understanding the sequence determinants underlying distinct primary cell types remains elusive and challenging.While deep neural networks have shown strong performance in predicting gene expression, the lack of meaningful explanations of predictions, especially in systematic understanding of the molecular mechanisms, motivates the search for more transparent models. We present an automated model that predicts gene expression from genetic sequences while providing both strong performance and direct interpretations of predictions. Our model combines a pre-trained genetic sequence class model and neural architecture search with symbolic regression to distill explainable genomic equations. We applied our method to an in-house human pituitary (a specialized gland in the brain that controls the endocrine system) single-cell gene expression data. The distilled genomic equation prediction accuracy (Pearson r=0.713) is comparable to other explainable models, without artificially introducing strong inductive bias that may not hold for the complex and potentially non-linear cellular system.The genomic equations shed light on how sequence classes interact and regulate the cell type-specific, finely-controlled transcriptomic program in the human endocrine system.To our knowledge, this is the first attempt at distilling genomic equations from neural networks using symbolic regression.

Chat is not available.