Exploring multi-site dataset shifts in electronic health records using time series features
Abstract
Models developed using longitudinal electronic health record (EHR) data can demonstrate inconsistent abilities to generalize to new data at different institutions. Rather than relying only only external validity of performance, we consider how distributional shifts in EHR data can inform multi-site generalizability without the need for task-specific models or annotations. Extending statistical dataset shift detection to time series through feature-based temporal analysis, we compare the EHR data from five different institutions and four different prior patient conditions for patients requiring the administration of an inpatient diuretic. We illustrate which sites exhibit greater variability as well as the EHR measures contributing to the variation, providing valuable insight into downstream deployment.