Fairness and robustness are often considered as orthogonal dimensions to evaluate machine learning on. Recent evidence has however displayed that fairness guarantees are not transferable across environments. In healthcare settings, this can result in e.g. a model that performs fairly (according to a selected metric) in hospital A showing unfairness when deployed in hospital B. Here we illustrate how fairness metrics may change under distribution shift using 2 real-world applications in Electronic Health Records (EHR) and Dermatology. We further show that clinically plausible shifts simultaneously affect multiple parts of the data generation process through a causal analysis. Such complex shifts invalidate most assumptions required by current mitigation techniques, which typically target either covariate or label shift. Our work hence displays a technical gap to a realistic problem and hopes to elicit further research at the intersection of fairness and robustness in real-world applications.