Timezone: »
Standard Bayesian inference algorithms are prohibitively expensive in the regime of modern large-scale data. Recent work has found that a small, weighted subset of data (a coreset) may be used in place of the full dataset during inference, taking advantage of data redundancy to reduce computational cost. However, this approach has limitations in the increasingly common setting of sensitive, high-dimensional data. Indeed, we prove that there are situations in which the Kullback-Leibler divergence between the optimal coreset and the true posterior grows with data dimension; and as coresets include a subset of the original data, they cannot be constructed in a manner that preserves individual privacy. We address both of these issues with a single unified solution, Bayesian pseudocoresets --- a small weighted collection of synthetic "pseudodata"---along with a variational optimization method to select both pseudodata and weights. The use of pseudodata (as opposed to the original datapoints) enables both the summarization of high-dimensional data and the differentially private summarization of sensitive data. Real and synthetic experiments on high-dimensional data demonstrate that Bayesian pseudocoresets achieve significant improvements in posterior approximation error compared to traditional coresets, and that pseudocoresets provide privacy without a significant loss in approximation quality.
Author Information
Dionysis Manousakas (University of Cambridge)
Zuheng Xu (University of British Columbia)
Cecilia Mascolo (University of Cambridge)
Trevor Campbell (UBC)
More from the Same Authors
-
2021 : COVID-19 Sounds: A Large-Scale Audio Dataset for Digital Respiratory Screening »
Tong Xia · Dimitrios Spathis · Chlo{\"e} Brown · J Ch · Andreas Grammenos · Jing Han · Apinan Hasthanasombat · Erika Bondareva · Ting Dang · Andres Floto · Pietro Cicuta · Cecilia Mascolo -
2022 : Hybrid-EDL: Improving Evidential Deep Learning for Uncertainty Quantification on Imbalanced Data »
Tong Xia · Jing Han · Lorena Qendro · Ting Dang · Cecilia Mascolo -
2022 Poster: Bayesian inference via sparse Hamiltonian flows »
Naitong Chen · Zuheng Xu · Trevor Campbell -
2022 Poster: Fast Bayesian Coresets via Subsampling and Quasi-Newton Refinement »
Cian Naik · Judith Rousseau · Trevor Campbell -
2022 Poster: Parallel Tempering With a Variational Reference »
Nikola Surjanovic · Saifuddin Syed · Alexandre Bouchard-Côté · Trevor Campbell -
2021 Workshop: Your Model is Wrong: Robustness and misspecification in probabilistic modeling »
Diana Cai · Sameer Deshpande · Michael Hughes · Tamara Broderick · Trevor Campbell · Nick Foti · Barbara Engelhardt · Sinead Williamson -
2020 Poster: Federated Principal Component Analysis »
Andreas Grammenos · Rodrigo Mendoza Smith · Jon Crowcroft · Cecilia Mascolo -
2019 Poster: Sparse Variational Inference: Bayesian Coresets from Scratch »
Trevor Campbell · Boyan Beronov -
2019 Poster: Universal Boosting Variational Inference »
Trevor Campbell · Xinglong Li