Skip to yearly menu bar Skip to main content

Workshop: Workshop on Federated Learning in the Age of Foundation Models in Conjunction with NeurIPS 2023 (FL@FM-NeurIPS'23)

User Inference Attacks on Large Language Models

Nikhil Kandpal · Krishna Pillutla · Alina Oprea · Peter Kairouz · Christopher A. Choquette-Choo · Zheng Xu

Keywords: [ privacy ] [ user inference ] [ user data ] [ LLM privacy ]


We study the privacy implications of fine-tuning large language models (LLMs) on user-stratified (i.e. federated) data. We define a realistic threat model, called user inference, wherein an attacker infers whether or not a user's data was used for fine-tuning. We implement attacks for this threat model that require only a small set of samples from a user (possibly different from the samples used for training) and black-box access to the fine-tuned LLM. We find that LLMs are susceptible to user inference attacks across a variety of fine-tuning datasets with outlier users (i.e., those with data distributions sufficiently different from other users) and users who contribute large quantities of data being most susceptible. Finally, we find that mitigation interventions in the training algorithm, such as batch or per-example gradient clipping and early stopping fail to prevent user inference while limiting the number of fine-tuning samples from a single user can reduce attack effectiveness (albeit at the cost of reducing the total amount of fine-tuning data).

Chat is not available.