Skip to yearly menu bar Skip to main content

Workshop: AI for Science: from Theory to Practice

scCLIP: Multi-modal Single-cell Contrastive Learning Integration Pre-training

Lei Xiong · Tianlong Chen · Manolis Kellis

Abstract: Recent advances in multi-modal single-cell sequencing technologies enable the simultaneous profiling of chromatin accessibility and transcriptome in individual cells. Integration analysis of multi-modal single-cell data offers a more comprehensive understanding of the regulatory mechanisms linking chromatin status and gene expression, driving cellular processes and diseases. In order to acquire features that align peaks and genes within the same embedding space and facilitate seamless zero-shot transfer to new data, we introduced $\texttt{scCLIP}$ (single-cell Contrastive Learning Integration Pretraining), a generalized multi-modal transformer model with contrastive learning. We show that this model outperforms other competing methods, and beyond this, $\texttt{scCLIP}$ learns transferable features across modalities and generalizes to unseen datasets, which pose the great potential to bridge the vast number of unpaired unimodal datasets both existing and new data generated in the future. Specifically, we propose the first large-scale transformer model designed for single-cell ATAC-seq data by patching peaks across the genomes and representing each patch as a token. This innovative approach enables us effectively to address the scalability challenges posed by scATAC-seq, even when dealing with datasets of up to $\textit{one million}$ dimensions. Codes are provided at:

Chat is not available.