Poster
in
Workshop: Synthetic Data for Empowering ML Research
Improving dermatology classifiers across populations using images generated by large diffusion models
Luke Sagers · James Diao · Matt Groh · Pranav Rajpurkar · Adewole Adamson · Arjun Manrai
Abstract:
Dermatological classification algorithms developed without sufficiently diverse training data may generalize poorly across populations. While more intentional data collection and annotation is the best way to increase representation, new computational approaches for generating training data may also aid in reducing representation bias. In this paper, we show that DALL·E 2, a large text-to-image diffusion model, can generate synthetic and photorealistic skin disease images across skin types. Using the Fitzpatrick 17k dataset as a benchmark, we demonstrate that including DALL·E 2-generated synthetic images improves classification accuracy of skin disease models overall and particularly for underrepresented groups.
Chat is not available.