Private Synthetic Data for Multitask Learning and Marginal Queries
Giuseppe Vietri · Cedric Archambeau · Sergul Aydore · William Brown · Michael Kearns · Aaron Roth · Ankit Siva · Shuai Tang · Steven Wu

Tue Nov 29 02:00 PM -- 04:00 PM (PST) @ Hall J #417

We provide a differentially private algorithm for producing synthetic data simultaneously useful for multiple tasks: marginal queries and multitask machine learning (ML). A key innovation in our algorithm is the ability to directly handle numerical features, in contrast to a number of related prior approaches which require numerical features to be first converted into {high cardinality} categorical features via {a binning strategy}. Higher binning granularity is required for better accuracy, but this negatively impacts scalability. Eliminating the need for binning allows us to produce synthetic data preserving large numbers of statistical queries such as marginals on numerical features, and class conditional linear threshold queries. Preserving the latter means that the fraction of points of each class label above a particular half-space is roughly the same in both the real and synthetic data. This is the property that is needed to train a linear classifier in a multitask setting. Our algorithm also allows us to produce high quality synthetic data for mixed marginal queries, that combine both categorical and numerical features. Our method consistently runs 2-5x faster than the best comparable techniques, and provides significant accuracy improvements in both marginal queries and linear prediction tasks for mixed-type datasets.

Author Information

Giuseppe Vietri (University of Minnesota)
Cedric Archambeau (Amazon Web Services)
Sergul Aydore (AWS AI)
William Brown (Columbia University)
Michael Kearns (University of Pennsylvania)

Michael Kearns is Professor and National Center Chair in the Computer and Information Science department at the University of Pennsylvania. His research interests include topics in machine learning, algorithmic game theory, social networks, and computational finance. Prior to joining the Penn faculty, he spent a decade at AT&T/Bell Labs, where he was head of AI Research. He is co-director of Penn’s Warren Center for Network and Data Sciences (warrencenter.upenn.edu), and founder of Penn’s Networked and Social Systems Engineering (NETS) undergraduate program (www.nets.upenn.edu). Kearns consults extensively in technology and finance, and is a Fellow of the Association for the Advancement of Artificial Intelligence and the American Academy of Arts and Sciences.

Aaron Roth (University of Pennsylvania)
Ankit Siva (Amazon)
Shuai Tang (Amazon Web Services)
Steven Wu (Carnegie Mellon University)

