Abstract:
-means++ \cite{arthur2007k} is a widely used clustering algorithm that is easy to implement, has nice theoretical guarantees and strong empirical performance. Despite its wide adoption, -means++ sometimes suffers from being slow on large data-sets so a natural question has been to obtain more efficient algorithms with similar guarantees. In this paper, we present such a near linear time algorithm for -means++ seeding. Interestingly our algorithm obtains the same theoretical guarantees as -means++ and significantly improves earlier results on fast -means++ seeding. Moreover, we show empirically that our algorithm is significantly faster than -means++ and obtains solutions of equivalent quality.
Chat is not available.