Causal Representation Learning from Multimodal EHRs under Non-Random Modality Missingness
Abstract
Clinical notes contain rich patient information, such as diagnoses or medications, making them valuable for patient representation learning. Recent advances in large language models have further improved the ability to extract meaningful representations from clinical texts. However, clinical notes are often missing—for example, 35\% of patients in real-world datasets lack them. In such cases, representations can be learned from other modalities such as structured data, chest X-rays, or radiology reports. Yet the availability of these modalities is influenced by clinical decision-making and varies across patients, resulting in modality missing-not-at-random (MMNAR) patterns. We propose a causal representation learning framework that leverages observed data and informative missingness in multimodal clinical records. It consists of: (1) a MMNAR-aware modality fusion module using pre-trained models and other encoders to capture both patient health and reasons for missing data in representation learning; (2) a representation construction module that enforces semantic sufficiency and distributional alignment across missingness patterns via cross-modal reconstruction and contrastive learning; and (3) a multitask prediction model, fine-tuned for each modality pattern using a rectifier to correct residual bias. On the MIMIC-IV dataset, our framework significantly outperforms recent baselines: AUC/APR increases by 13.15\%/12.88\% for hospital readmission, and by 25.45\%/81.22\% for ICU admission.