Poster
The Price of Fair PCA: One Extra dimension
Samira Samadi · Uthaipon Tantipongpipat · Jamie Morgenstern · Mohit Singh · Santosh Vempala

Tue Dec 4th 05:00 -- 07:00 PM @ Room 517 AB #138

We investigate whether the standard dimensionality reduction technique of PCA inadvertently produces data representations with different fidelity for two different populations. We show on several real-world data sets, PCA has higher reconstruction error on population A than on B (for example, women versus men or lower- versus higher-educated individuals). This can happen even when the data set has a similar number of samples from A and B. This motivates our study of dimensionality reduction techniques which maintain similar fidelity for A and B. We define the notion of Fair PCA and give a polynomial-time algorithm for finding a low dimensional representation of the data which is nearly-optimal with respect to this measure. Finally, we show on real-world data sets that our algorithm can be used to efficiently generate a fair low dimensional representation of the data.

Author Information

Samira Samadi (Georgia Tech)
Tao (Uthaipon) Tantipongpipat (Georgia Tech)

Graduating PhD student in machine learning theory and optimization. Strong background in mathematics and algorithmic foundations of data science with hands-on implementations on real-world datasets. Strive for impact and efficiency while attentive to details. Enjoy public speaking and experienced in leading research projects. Published many theoretical results in academic conferences and developed several optimized algorithms for public use. My research includes • Approximation algorithms in optimal design in statistics, as known as design of experiments (DoE) using combinatorial optimization. Diversity or representative sampling. • Differential privacy – theory of privacy in growing database; its deployment in deep learning models such as RNNs, LSTMs, autoencoders, and GANs; and its application in private synthetic data generation. • Fairness in machine learning – fair principle component analysis (fair PCA) using convex optimization and randomized rounding to obtain low-rank solution to semi-definite programming Other Interests: model compressions; privacy and security in machine learning; fair and explainable/interpretable machine learning

Jamie Morgenstern (Georgia Tech)
Mohit Singh (Georgia Tech)
Santosh Vempala (Georgia Tech)

More from the Same Authors