Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Regulatable ML: Towards Bridging the Gaps between Machine Learning Research and Regulations

Where did you learn that?: Tracing the Impact of Training Data with Diffusion Model Ensembles

Zheng Dai · Rui-Jie Yew · David Gifford


Abstract:

The widespread adoption of diffusion models for creative uses such as image, video, and audio synthesis has raised serious legal and ethical concerns surrounding the use of training data and its regulation. Due to the size and complexity of these models, the effect of training data is difficult to characterize with existing methods, confounding regulatory efforts. In this work we propose a novel approach to trace the impact of training data using an encoded ensemble of diffusion models. In our approach, individual models in an ensemble are trained on encoded subsets of the overall training data to permit the identification of important training samples. The resulting ensemble allows us to efficiently remove the impact of any training sample. We demonstrate the viability of these ensembles for assessing influence and consider the regulatory implications of this work.

Chat is not available.