Workshop: Synthetic Data for Empowering ML Research

Medical Scientific Table-to-Text Generation with Synthetic Data under Data Sparsity Constraint

Heng-Yi Wu · Jingqing Zhang · Julia Ive · Tong Li · Vibhor Gupta · Bingyuan Chen · Yike Guo


An efficient table-to-text summarization system can drastically reduce manual efforts to understand and summarise tabular data into textual reports. However, in practice, the problem is heavily impeded by data sparsity and the inability of the state-of-the-art natural language generation models (such as T5, PEGASUS, and GPT-Neo) to produce coherent and accurate outputs. This is particularly true in pre-clinical and clinical domains. In this paper, we propose a novel table-to-text approach and tackle these problems with the help of synthetic data generation as well as copy mechanism. Experiments show that the proposed method can boost the performance of copying concise and relevant information from tabular data to generate assay validation and toxicology reports.

Chat is not available.