Skip to yearly menu bar Skip to main content

Workshop: Synthetic Data Generation with Generative AI

Knowledge-Infused Prompting Improves Clinical Text Generation with Large Language Models

Ran Xu · Hejie Cui · Yue Yu · Xuan Kan · Wenqi Shi · Yuchen Zhuang · Wei Jin · Joyce Ho · Carl Yang

Keywords: [ Large language models ] [ Clinical NLP ] [ Synthetic Data Generation ] [ prompting ]


Clinical natural language processing requires methods that can address domain-specific challenges, such as complex medical terminology and clinical contexts. Recently, large language models (LLMs) have shown promise in this domain. Yet, their direct deployment can lead to privacy issues and are constrained by resources. To address this challenge, we propose ClinGen, which infuses knowledge into synthetic clinical text generation using LLMs for clinical NLP tasks. Our model involves clinical knowledge extraction and context-informed LLM prompting. Both clinical topics and writing styles are drawn from external domain-specific knowledge graphs and LLMs to guide data generation. Extensive studies across 7 clinical NLP tasks and 16 datasets reveal that ClinGen consistently enhances performance across various tasks, effectively aligning the distribution of real datasets and enriching the diversity of generated training instances.

Chat is not available.