Timezone: »

 
A single-cell gene expression language model
William Connell · Umair Khan · Michael Keiser
Event URL: https://openreview.net/forum?id=XxRuCIgq2LX »

Gene regulation is a dynamic process that connects genotype and phenotype. Given the difficulty of physically mapping mammalian gene circuitry, we require new computational methods to learn regulatory rules. Natural language is a valuable analogy to the communication of regulatory control. Machine learning systems model natural language by explicitly learning context dependencies between words. We propose a similar system applied to single-cell RNA expression profiles to learn context dependencies between genes. Our model, Exceiver, is trained across a diversity of cell types using a self-supervised task formulated for discrete count data, accounting for feature sparsity. We found agreement between the similarity profiles of latent sample representations and learned gene embeddings with respect to biological annotations. We evaluated Exceiver on a new dataset and a downstream prediction task and found that pretraining supports transfer learning. Our work provides a framework to model gene regulation on a single-cell level and transfer knowledge to downstream tasks.

Author Information

William Connell (UCSF)
Umair Khan (University of California, San Francisco)
Michael Keiser (University of California, San Francisco)

Michael J Keiser PhD is a Chan Zuckerberg Initiative Ben Barres Investigator and an Allen Distinguished Investigator. Michael joined the UCSF faculty as an Assistant Professor in 2014, in the Dept. of Pharmaceutical Chemistry and the Institute for Neurodegenerative Diseases, with appointments in the Dept. of Bioengineering & Therapeutic Sciences and the Bakar Computational Health Sciences Institute. Before this, he co-founded a startup bringing systems pharmacology methods for drug-target prediction to pharma and the US FDA, where they are in use today. He holds multiple degrees from Stanford, including a BSc. in Computer Science. Broadly, the Keiser lab combines machine learning with chemical biology methods to investigate how drug-like small molecules perturb protein networks to achieve their therapeutic effects.

More from the Same Authors

  • 2019 : Molecules and Genomes »
    David Haussler · Djork-ArnĂ© Clevert · Michael Keiser · Alan Aspuru-Guzik · David Duvenaud · David Jones · Jennifer Wei · Alexander D'Amour