Timezone: »
The workshop focuses on the problem of privacy-preserving machine learning in scenarios where sensitive datasets are distributed across multiple data owners. Such distributed scenarios occur quite often in practice, for example when different parties contribute different records to a dataset, or information about each record in the dataset is held by different owners. Different communities have developed approaches to deal with this problem, including differential privacy-like techniques where noisy sketches are exchanged between the parties, homomorphic encryption where operations are performed on encrypted data, and tailored approaches using techniques from the field of secure multi-party computation. The workshop will serve as a forum to unify different perspectives on this problem and explore the relative merits of each approach. The workshop will also serve as a venue for networking researchers from the machine learning and secure multi-party computation communities interested in private learning, and foster fruitful long-term collaborations.
The workshop will have a particular emphasis in the decentralization aspect of privacy-preserving machine learning. This includes a large number of realistic scenarios where the classical setup of differential privacy with a "trusted curator" that prepares the data cannot be directly applied. The problem of privacy-preserving computation gains relevance in this model, and effectively leveraging the tools developed by the cryptographic community to develop private multi-party learning algorithms poses a remarkable challenge. Our program will include an introductory tutorial to secure multi-party computation for a machine learning audience, and talks by world-renowned experts from the machine learning and cryptography communities who have made high quality contributions to this problem.
Fri 12:00 a.m. - 12:45 a.m.
|
Mariana Raykova — Secure Computation: Why, How and When
(
Invited Talk
)
The goal of secure computation is to facilitate the evaluation of functionalities that depends on the private inputs of several distrusting parties in a privacy preserving manner which minimizes the information revealed about the inputs. In this talk we will introduce example problems motivating the work in the area of secure computation including problems related to machine learning. We will discuss how we formalize the notion of privacy in cryptographic protocols and how we prove privacy preserving properties for secure computation constructions. We will provide an overview of some main techniques and constructions for secure computation including Yao garbled circuits, approaches based an secret sharing and others. Lastly we will cover the different efficiency measures relevant for the practical use of secure computation protocols. |
🔗 |
Fri 12:45 a.m. - 1:30 a.m.
|
Stratis Ioannidis — Secure Function Evaluation at Scale
(
Invited Talk
)
Secure Function Evaluation (SFE) allows an interested party to evaluate a function over private data without learning anything about the inputs other than the outcome of this computation. This offers a strong privacy guarantee: SFE enables, e.g., a medical researcher, a statistician, or a data analyst, to conduct a study over private, sensitive data, without jeopardizing the privacy of the study's participants (patients, online users, etc.). Nevertheless, applying SFE to “big data” poses several challenges. First, beyond any computational overheads due to encryptions and decryptions, executing an algorithm securely may lead to a polynomial blowup in the total work (i.e., number of computation steps) compared to execution in the clear. For large datasets, even going from linear to quadratic time is prohibitive. Second, secure evaluations of algorithms should also maintain parallelizability: an algorithm that is easy to parallelize in the clear should also maintain this property in its SFE version, if its execution is to scale. Addressing this is challenging as communication patterns between processors often reveal a lot about the private inputs. In this talk, we show that several machine learning and data mining algorithms can be executed securely while leading to only (a) a logarithmic blow-up in total work and (b) a logarithmic increase in the execution time when executed in parallel. Our techniques generalize to any algorithm that is graph-parallel, i.e., can be expressed through a sequence of scatter, gather, and apply operations. This includes several algorithms of great practical interest, including page rank, matrix factorization, and training neural networks, to name a few. |
🔗 |
Fri 2:00 a.m. - 2:30 a.m.
|
Jack Doerner — An Introduction to Practical MPC
(
Invited Talk
)
The field of Secure Multiparty Computation (MPC) has seen an explosion of research over the past few years: though once a mostly theoretical idea, it has rapidly become a powerful, practical tool capable of efficiently solving real-world problems. However, this has come at the cost of dramatically increased complexity, expressed through a diversity of foundational techniques, high-level systems, and implementation folk knowledge. This talk will address the practical aspects of multiparty computation, discussing a number of low level paradigms and their accompanying implementations, along with the various efficiency, functionality, and usability compromises that they offer. In addition, it will serve as an introduction to a set of tools and techniques that are commonly used in conjunction with generic MPC schemes, such as Oblivious RAM, permutation networks, and custom protocols. It is hoped that this will serve as a jumping-off-point, from which new problems can be addressed. |
🔗 |
Fri 2:30 a.m. - 2:50 a.m.
|
AnonML: Anonymous Machine Learning Over a Network of Data Holders
(
Contributed Talk
)
Bennett Cyphers, Kalyan Veeramachaneni Centralized data warehouses can be disadvantageous to users for many reasons, including privacy, security, and control. We propose AnonML, a system for anonymous, peer-to-peer machine learning. At a high level, AnonML functions by moving as much computation as possible to its end users, away from a central authority. AnonML users store and compute features on their own data, thereby limiting the amount of information they need to share. To generate a model, a group of data-holding peers first agree on a model definition, a set of feature functions, and an aggregator, a peer who temporarily acts as a central authority. Each peer anonymously sends several small packets of labeled feature data to the aggregator. In exchange, the aggregator generates a classifier and shares it with the group. In this way, AnonML data holders control what information they share on a feature-by-feature and model-by-model basis, and peers are able to disassociate features from their digital identities. Additionally, each peer can generate models suited to their particular needs, and the whole network benefits from the creation of novel, useful models. |
🔗 |
Fri 2:50 a.m. - 3:10 a.m.
|
Private Topic Modeling
(
Contributed Talk
)
Mijung Park, James Foulds, Kamalika Chaudhuri, Max Welling We develop a privatised stochastic variational inference method for Latent Dirichlet Allocation (LDA). The iterative nature of stochastic variational inference presents challenges: multiple iterations are required to obtain accurate posterior distributions, yet each iteration increases the amount of noise that must be added to achieve a reasonable degree of privacy. We propose a practical algorithm that overcomes this challenge by combining: (1) A relaxed notion of the differential privacy, called concentrated differential privacy, which provides high probability bounds for cumulative privacy loss, which is well suited for iterative algorithms, rather than focusing on single-query loss; and (2) privacy amplification resulting from subsampling of large-scale data. Focusing on conjugate exponential family models, in our private variational inference, all the posterior distributions will be privatised by simply perturbing expected sufficient statistics. Using Wikipedia data, we illustrate the effectiveness of our algorithm for large-scale data. |
🔗 |
Fri 3:15 a.m. - 4:00 a.m.
|
Poster Spotlights
(
Short Talks
)
|
🔗 |
Fri 5:30 a.m. - 6:00 a.m.
|
Practical Secure Aggregation for Federated Learning on User-Held Data
(
Contributed Talk
)
Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, Karn Seth Secure Aggregation is a class of Secure Multi-Party Computation algorithms wherein a group of mutually distrustful parties u ∈ U each hold a private value xu and collaborate to compute an aggregate value, such as the sum P = SUM(xu, u∈U), without revealing to one another any information about their private value except what is learnable from the aggregate value itself. In this work, we consider training a deep neural network in the Federated Learning model, using distributed gradient descent across user-held training data on mobile devices, using Secure Aggregation to protect the privacy of each user’s model gradient. We identify a combination of efficiency and robustness requirements which, to the best of our knowledge, are unmet by existing algorithms in the literature. We proceed to design a novel, communication-efficient Secure Aggregation protocol for high-dimensional data that tolerates up to 1/3 of users failing to complete the protocol. For 16-bit input values, our protocol offers 1.73× communication expansion for 2^10 users and 2^20-dimensional vectors, and 1.98× expansion for 2^14 users and 2^24-dimensional vectors. |
🔗 |
Fri 6:30 a.m. - 7:30 a.m.
|
Poster Session
(
Posters
)
|
🔗 |
Fri 7:30 a.m. - 8:15 a.m.
|
Richard Nock — Confidential Computing - Federate Private Data Analysis
(
Invited Talk
)
|
🔗 |
Fri 8:15 a.m. - 9:00 a.m.
|
Dawn Song — Lessons and Open Challenges in Secure and Privacy-Preserving Machine Learning and Data Analytics
(
Invited Talk
)
|
🔗 |
Author Information
Borja Balle (Amazon Research Cambridge)
Aurélien Bellet (INRIA)
David Evans (University of Virginia)
Adrià Gascón (University of Edinburgh)
More from the Same Authors
-
2020 : Distributed Differentially Private Averaging with Improved Utility and Robustness to Malicious Parties »
Aurélien Bellet -
2020 : Privacy Amplification by Decentralization »
Aurélien Bellet -
2021 : Reconstructing Training Data with Informed Adversaries »
Borja Balle · Giovanni Cherubin · Jamie Hayes -
2022 : Refined Convergence and Topology Learning for Decentralized Optimization with Heterogeneous Data »
Batiste Le bars · Aurélien Bellet · Marc Tommasi · Erick Lavoie · Anne-marie Kermarrec -
2022 : Fairness Certificates for Differentially Private Classification »
Paul Mangold · Michaël Perrot · Marc Tommasi · Aurélien Bellet -
2023 Poster: When Can Linear Learners be Robust to Indiscriminate Poisoning Attacks? »
Fnu Suya · Xiao Zhang · Yuan Tian · David Evans -
2023 Poster: GlucoSynth: Generating Differentially-Private Synthetic Glucose Traces »
Josephine Lamp · Mark Derdzinski · Christopher Hannemann · Joost van der Linden · Lu Feng · Tianhao Wang · David Evans -
2022 Poster: FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings »
Jean Ogier du Terrail · Samy-Safwan Ayed · Edwige Cyffers · Felix Grimberg · Chaoyang He · Regis Loeb · Paul Mangold · Tanguy Marchand · Othmane Marfoq · Erum Mushtaq · Boris Muzellec · Constantin Philippenko · Santiago Silva · Maria Teleńczuk · Shadi Albarqouni · Salman Avestimehr · Aurélien Bellet · Aymeric Dieuleveut · Martin Jaggi · Sai Praneeth Karimireddy · Marco Lorenzi · Giovanni Neglia · Marc Tommasi · Mathieu Andreux -
2022 Poster: Muffliato: Peer-to-Peer Privacy Amplification for Decentralized Optimization and Averaging »
Edwige Cyffers · Mathieu Even · Aurélien Bellet · Laurent Massoulié -
2021 Poster: Federated Multi-Task Learning under a Mixture of Distributions »
Othmane Marfoq · Giovanni Neglia · Aurélien Bellet · Laetitia Kameni · Richard Vidal -
2020 Workshop: Privacy Preserving Machine Learning - PriML and PPML Joint Edition »
Borja Balle · James Bell · Aurélien Bellet · Kamalika Chaudhuri · Adria Gascon · Antti Honkela · Antti Koskela · Casey Meehan · Olga Ohrimenko · Mi Jung Park · Mariana Raykova · Mary Anne Smart · Yu-Xiang Wang · Adrian Weller -
2020 Session: Orals & Spotlights Track 10: Social/Privacy »
Yanan Sui · Aurélien Bellet -
2019 Poster: Empirically Measuring Concentration: Fundamental Limits on Intrinsic Robustness »
Saeed Mahloujifar · Xiao Zhang · Mohammad Mahmoody · David Evans -
2019 Spotlight: Empirically Measuring Concentration: Fundamental Limits on Intrinsic Robustness »
Saeed Mahloujifar · Xiao Zhang · Mohammad Mahmoody · David Evans -
2018 Workshop: Privacy Preserving Machine Learning »
Adria Gascon · Aurélien Bellet · Niki Kilbertus · Olga Ohrimenko · Mariana Raykova · Adrian Weller -
2018 : Aurélien Bellet »
Aurélien Bellet -
2018 Poster: Privacy Amplification by Subsampling: Tight Analyses via Couplings and Divergences »
Borja Balle · Gilles Barthe · Marco Gaboardi -
2018 Poster: Distributed Learning without Distress: Privacy-Preserving Empirical Risk Minimization »
Bargav Jayaraman · Lingxiao Wang · David Evans · Quanquan Gu -
2017 : Personalized and Private Peer-to-Peer Machine Learning »
Aurélien Bellet · Rachid Guerraoui · Marc Tommasi -
2016 Poster: On Graph Reconstruction via Empirical Risk Minimization: Fast Learning Rates and Scalability »
Guillaume Papa · Aurélien Bellet · Stephan Clémençon -
2015 Poster: SGD Algorithms based on Incomplete U-statistics: Large-Scale Minimization of Empirical Risk »
Guillaume Papa · Stéphan Clémençon · Aurélien Bellet -
2015 Poster: Extending Gossip Algorithms to Distributed Estimation of U-statistics »
Igor Colin · Aurélien Bellet · Joseph Salmon · Stéphan Clémençon -
2015 Spotlight: Extending Gossip Algorithms to Distributed Estimation of U-statistics »
Igor Colin · Aurélien Bellet · Joseph Salmon · Stéphan Clémençon