Timezone: »

SeqPATE: Differentially Private Text Generation via Knowledge Distillation
Zhiliang Tian · Yingxiu Zhao · Ziyue Huang · Yu-Xiang Wang · Nevin L. Zhang · He He

Thu Dec 01 02:00 PM -- 04:00 PM (PST) @ Hall J #233

Protecting the privacy of user data is crucial for text generation models, which can leak sensitive information during generation. Differentially private (DP) learning methods provide guarantees against identifying the existence of a training sample from model outputs. PATE is a recent DP learning algorithm that achieves high utility with strong privacy protection on training samples. However, text generation models output tokens sequentially in a large output space; the classic PATE algorithm is not customized for this setting. Furthermore, PATE works well to protect sample-level privacy, but is not designed to protect phrases in samples. In this paper, we propose SeqPATE, an extension of PATE to text generation that protects the privacy of individual training samples and sensitive phrases in training data. To adapt PATE to text generation, we generate pseudo-contexts and reduce the sequence generation problem to a next-word prediction problem. To handle the large output space, we propose a candidate filtering strategy to dynamically reduce the output space, and refine the teacher aggregation of PATE to avoid low agreement due to voting for a large number of candidates. To further reduce privacy losses, we use knowledge distillation to reduce the number of teacher queries. The experiments verify the effectiveness of SeqPATE in protecting both training samples and sensitive phrases.

Author Information

Zhiliang Tian (Naional University of Defense Technology)
Yingxiu Zhao (The Hong Kong University of Science and Technology)
Ziyue Huang (Hong Kong University of Science and Technology)
Yu-Xiang Wang (UC Santa Barbara)
Nevin L. Zhang (HKUST)
He He (NYU)

More from the Same Authors