Poster
List-Decodable Mean Estimation in Nearly-PCA Time
Ilias Diakonikolas · Daniel Kane · Daniel Kongsgaard · Jerry Li · Kevin Tian
Keywords: [ Clustering ] [ Theory ]
Abstract:
Robust statistics has traditionally focused on designing estimators tolerant to a minority of contaminated data. {\em List-decodable learning}~\cite{CharikarSV17} studies the more challenging regime where only a minority fraction of the dataset, , is drawn from the distribution of interest, and no assumptions are made on the remaining data. We study the fundamental task of list-decodable mean estimation in high dimensions. Our main result is a new algorithm for bounded covariance distributions with optimal sample complexity and near-optimal error guarantee, running in {\em nearly-PCA time}. Assuming the ground truth distribution on has identity-bounded covariance, our algorithm outputs candidate means, one of which is within distance from the truth. Our algorithm runs in time , where is the dataset size. This runtime nearly matches the cost of performing -PCA on the data, a natural bottleneck of known algorithms for (very) special cases of our problem, such as clustering well-separated mixtures. Prior to our work, the fastest runtimes were ~\cite{DiakonikolasKK20}, and \cite{CherapanamjeriMY20} for an unspecified constant . Our approach builds on a novel soft downweighting method we term SIFT, arguably the simplest known polynomial-time mean estimator in the list-decodable setting. To develop our fast algorithms, we boost the computational cost of SIFT via a careful win-win-win'' analysis of an approximate Ky Fan matrix multiplicative weights procedure we develop, which may be of independent interest.
Chat is not available.