Timezone: »
In a recent work, \cite{chierichetti2017fair} studied the following ``fair'' variants of classical clustering problems such as k-means and k-median: given a set of n data points in R^d and a binary type associated to each data point, the goal is to cluster the points while ensuring that the proportion of each type in each cluster is roughly the same as its underlying proportion. Subsequent work has focused on either extending this setting to when each data point has multiple, non-disjoint sensitive types such as race and gender \cite{bera2019fair}, or to address the problem that the clustering algorithms in the above work do not scale well. The main contribution of this paper is an approach to clustering with fairness constraints that involve {\em multiple, non-disjoint} attributes, that is {\em also scalable}. Our approach is based on novel constructions of coresets: for the k-median objective, we construct an \eps-coreset of size O(\Gamma k^2 \eps^{-d}) where \Gamma is the number of distinct collections of groups that a point may belong to, and for the k-means objective, we show how to construct an \eps-coreset of size O(\Gamma k^3\eps^{-d-1}). The former result is the first known coreset construction for the fair clustering problem with the k-median objective, and the latter result removes the dependence on the size of the full dataset as in~\cite{schmidt2018fair} and generalizes it to multiple, non-disjoint attributes. Importantly, plugging our coresets into existing algorithms for fair clustering such as \cite{backurs2019scalable} results in the fastest algorithms for several cases. Empirically, we assess our approach over the \textbf{Adult} and \textbf{Bank} dataset, and show that the coreset sizes are much smaller than the full dataset; applying coresets indeed accelerates the running time of computing the fair clustering objective while ensuring that the resulting objective difference is small.
Author Information
Lingxiao Huang (EPFL)
Shaofeng Jiang (Weizmann Institute of Science)
Nisheeth Vishnoi (Yale University)
More from the Same Authors
-
2021 Spotlight: Coresets for Time Series Clustering »
Lingxiao Huang · K Sudhir · Nisheeth Vishnoi -
2022 Spotlight: Lightning Talks 2A-2 »
Harikrishnan N B · Jianhao Ding · Juha Harviainen · Yizhen Wang · Lue Tao · Oren Mangoubi · Tong Bu · Nisheeth Vishnoi · Mohannad Alhanahnah · Mikko Koivisto · Aditi Kathpalia · Lei Feng · Nithin Nagaraj · Hongxin Wei · Xiaozhu Meng · Petteri Kaski · Zhaofei Yu · Tiejun Huang · Ke Wang · Jinfeng Yi · Jian Liu · Sheng-Jun Huang · Mihai Christodorescu · Songcan Chen · Somesh Jha -
2022 Spotlight: Re-Analyze Gauss: Bounds for Private Matrix Approximation via Dyson Brownian Motion »
Oren Mangoubi · Nisheeth Vishnoi -
2022 Spotlight: Sampling from Log-Concave Distributions with Infinity-Distance Guarantees »
Oren Mangoubi · Nisheeth Vishnoi -
2022 Spotlight: Lightning Talks 2A-1 »
Caio Kalil Lauand · Ryan Strauss · Yasong Feng · lingyu gu · Alireza Fathollah Pour · Oren Mangoubi · Jianhao Ma · Binghui Li · Hassan Ashtiani · Yongqi Du · Salar Fattahi · Sean Meyn · Jikai Jin · Nisheeth Vishnoi · zengfeng Huang · Junier B Oliva · yuan zhang · Han Zhong · Tianyu Wang · John Hopcroft · Di Xie · Shiliang Pu · Liwei Wang · Robert Qiu · Zhenyu Liao -
2022 Poster: Sampling from Log-Concave Distributions with Infinity-Distance Guarantees »
Oren Mangoubi · Nisheeth Vishnoi -
2022 Poster: Fair Ranking with Noisy Protected Attributes »
Anay Mehrotra · Nisheeth Vishnoi -
2022 Poster: Re-Analyze Gauss: Bounds for Private Matrix Approximation via Dyson Brownian Motion »
Oren Mangoubi · Nisheeth Vishnoi -
2022 Poster: Efficient Submodular Optimization under Noise: Local Search is Robust »
Lingxiao Huang · Yuyi Wang · Chunxue Yang · Huanjian Zhou -
2022 Poster: Coresets for Vertical Federated Learning: Regularized Linear Regression and $K$-Means Clustering »
Lingxiao Huang · Zhize Li · Jialin Sun · Haoyu Zhao -
2021 Poster: Fair Classification with Adversarial Perturbations »
L. Elisa Celis · Anay Mehrotra · Nisheeth Vishnoi -
2021 Poster: Coresets for Time Series Clustering »
Lingxiao Huang · K Sudhir · Nisheeth Vishnoi -
2020 Poster: Coresets for Regressions with Panel Data »
Lingxiao Huang · K Sudhir · Nisheeth Vishnoi -
2019 Poster: Online sampling from log-concave distributions »
Holden Lee · Oren Mangoubi · Nisheeth Vishnoi