Affinity Workshop: Women in Machine Learning

Shortcuts in Public Medical Image Datasets

Amelia Jiménez-Sánchez · Andreas Skovdal · Frederik Bechmann Faarup · Kasper Thorhauge Grønbek · Veronika Cheplygina


Artificial Intelligence (AI) is a promising field for medical imaging algorithms. Medical institutions are starting to integrate AI systems for screening and computer-aided diagnosis. However, recent studies show that even with high performance on the existing data, algorithms can learn “shortcuts,” like visibility of medical tools, and fail to generalize. We identify as shortcuts the presence of chest drains and images containing text for pneumothorax and breast cancer classification, respectively. The model for pneumothorax classification achieved an Area Under the Curve (AUC) of 0.93, 0.89 and 0.96 for the baseline set, the set without drains, and the set with chest drains, respectively. The model for breast cancer classification achieved an AUC of 0.78. This performance dropped to 0.682 when the images that contain text where removed. The degradation in the performance showcases the risk of this models being clinically deployed. In future work, we plan to investigate automatic ways to identify and avoid learning such shortcuts. In particular, we will research the use of meta-data to improve the robustness of AI algorithms.

Chat is not available.