scCMap: Connecting Genetic and Chemical Perturbations at Single-Cell Resolution
Abstract
Systematic mapping of cellular perturbations is fundamental to biomedical research. Pioneering efforts like the Connectivity Map (CMap) established a landmark paradigm by linking compound and gene interventions through their resulting bulk cellular gene expression. More recent initiatives of Tahoe100M and X-Atlas/Orion demonstrated the feasibility of large-scale single-cell transcriptomic profiling of chemical and genetic perturbations, respectively. Yet they focus on single perturbation domains, and there is currently no comprehensive perturbation resource systematically integrates both domains at single-cell resolution. Therefore, we propose to construct scCMap, an experimentally harmonized single-cell transcriptomic perturbation map derived from both genetic and chemical screens, designed to enable cross-domain perturbation connection, analysis, and modeling. scCMap will capture complex multi-scale perturbation reagent information, extensive phenotypic matrices, and rich metadata. By providing a unified, high-quality data resource, scCMap will drive a range of AI tasks, including multi-modal molecular representation learning, cross-domain perturbation transfer, biological causal inference, universal single-cell foundation model construction, and virtual AI cell simulations. We anticipate scCMap to mark a new milestone in biomedical research and accelerate AI-powered precision drug discovery.