Enhancing the FAIRness of Public Transcriptomics Data through AI-Accelerated Computational Curation
Serghei Mangul
Abstract
This dataset aims to accelerate the discovery of genetic and cellular mecha-nisms in human diseases, particularly in immunology, by addressing the lack ofcomprehensive metadata in public transcriptomics data. The central AI taskis a multi-modal data enrichment and prediction task. The proposed datasetwill train models to: (1) Predict Missing or Erroneous Metadata, suchas biological sex, tissue type, and disease state, from raw RNA-Seq data; (2)Generate Novel Biological Features like cell type composition and immunerepertoires, which are not explicitly measured; and (3) Integrate Cross-modalInformation to discover new gene-phenotype associations, enabling large-scale,cell-type-specific Expression Quantitative Trait Loci (eQTL) studies.
Chat is not available.
Successful Page Load