Timezone: »

Adapting Pretrained Vision-Language Foundational Models to Medical Imaging Domains
Pierre Chambon · Christian Bluethgen · Curtis Langlotz · Akshay Chaudhari
Event URL: https://openreview.net/forum?id=QtxbYdJVT8Q »

Multi-modal foundational models are trained on millions of pairs of natural images and texts, frequently obtained through web-crawling approaches. Although their performance is excellent, these models do not generalize well to other domains, such as medical imaging, especially when these domains do not resemble the centric-like images that can be found on the web. In this study, we assess the ability of the stable diffusion model to generate domain-specific images in the particular case of medical imaging. Based on quantitative and qualitative evaluations of the main components of the stable diffusion pipeline (the variational autoencoder, the U-Net and the text-encoder), we explore several approaches to fine-tune stable diffusion to generate radiological images, which accurately represent the clinical content of conditional text prompts. Our best-performing model improves upon the stable diffusion baseline and can be correctly conditioned to insert an abnormality on a synthetic radiology image.

Author Information

Pierre Chambon (Stanford University)
Christian Bluethgen (Stanford University)
Curtis Langlotz
Akshay Chaudhari (Stanford University)

More from the Same Authors