Timezone: »
Spotlight
Coresets for Clustering with Missing Values
Vladimir Braverman · Shaofeng Jiang · Robert Krauthgamer · Xuan Wu
@
We provide the first coreset for clustering points in $\mathbb{R}^d$ that have multiple missing values (coordinates). Previous coreset constructions only allow one missing coordinate. The challenge in this setting is that objective functions, like \kMeans, are evaluated only on the set of available (non-missing) coordinates, which varies across points. Recall that an $\epsilon$-coreset of a large dataset is a small proxy, usually a reweighted subset of points, that $(1+\epsilon)$-approximates the clustering objective for every possible center set.Our coresets for $k$-Means and $k$-Median clustering have size $(jk)^{O(\min(j,k))} (\epsilon^{-1} d \log n)^2$, where $n$ is the number of data points, $d$ is the dimension and $j$ is the maximum number of missing coordinates for each data point. We further design an algorithm to construct these coresets in near-linear time, and consequently improve a recent quadratic-time PTAS for $k$-Means with missing values [Eiben et al., SODA 2021] to near-linear time.We validate our coreset construction, which is based on importance sampling and is easy to implement, on various real data sets. Our coreset exhibits a flexible tradeoff between coreset size and accuracy, and generally outperforms the uniform-sampling baseline. Furthermore, it significantly speeds up a Lloyd's-style heuristic for $k$-Means with missing values.
Author Information
Vladimir Braverman (Johns Hopkins University)
Shaofeng Jiang (Peking University)
Robert Krauthgamer (Weizmann Institute)
Xuan Wu (Johns Hopkins Univeristy)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Coresets for Clustering with Missing Values »
Fri. Dec 10th 04:30 -- 06:00 PM Room
More from the Same Authors
-
2022 : Bidirectional Adaptive Communication for Heterogeneous Distributed Learning »
Dmitrii Avdiukhin · Vladimir Braverman · Nikita Ivkin · Sebastian Stich -
2022 : From Local to Global: Spectral-Inspired Graph Neural Networks »
Ningyuan Huang · Soledad Villar · Carey E Priebe · Da Zheng · Chengyue Huang · Lin Yang · Vladimir Braverman -
2022 Spotlight: Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime »
Difan Zou · Jingfeng Wu · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2022 Poster: The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift »
Jingfeng Wu · Difan Zou · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2022 Poster: Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime »
Difan Zou · Jingfeng Wu · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2021 Poster: The Benefits of Implicit Regularization from SGD in Least Squares Problems »
Difan Zou · Jingfeng Wu · Vladimir Braverman · Quanquan Gu · Dean Foster · Sham Kakade -
2021 Poster: Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning »
Jingfeng Wu · Vladimir Braverman · Lin Yang -
2021 Poster: Adversarial Robustness of Streaming Algorithms through Importance Sampling »
Vladimir Braverman · Avinatan Hassidim · Yossi Matias · Mariano Schain · Sandeep Silwal · Samson Zhou -
2019 Poster: Communication-efficient Distributed SGD with Sketching »
Nikita Ivkin · Daniel Rothchild · Enayat Ullah · Vladimir Braverman · Ion Stoica · Raman Arora -
2018 Poster: The Physical Systems Behind Optimization Algorithms »
Lin Yang · Raman Arora · Vladimir Braverman · Tuo Zhao -
2018 Poster: Differentially Private Robust Low-Rank Approximation »
Raman Arora · Vladimir Braverman · Jalaj Upadhyay -
2017 : Poster Session »
Tsz Kit Lau · Johannes Maly · Nicolas Loizou · Christian Kroer · Yuan Yao · Youngsuk Park · Reka Agnes Kovacs · Dong Yin · Vlad Zhukov · Woosang Lim · David Barmherzig · Dimitris Metaxas · Bin Shi · Rajan Udwani · William Brendel · Yi Zhou · Vladimir Braverman · Sijia Liu · Eugene Golikov