Timezone: »

Oral
Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search
Benjamin Moseley · Joshua Wang

Tue Dec 05 11:10 AM -- 11:25 AM (PST) @ Hall A
Hierarchical clustering is a data analysis method that has been used for decades. Despite its widespread use, there is a lack of an analytical foundation for the method. Having such a foundation would both support the methods currently used and guide future improvements. This paper gives an applied algorithmic foundation for hierarchical clustering. The goal of this paper is to give an analytic framework supporting observations seen in practice. This paper considers the dual of a problem framework for hierarchical clustering introduced by Dasgupta. The main results are that one of the most popular algorithms used in practice, average-linkage agglomerative clustering, has a small constant approximation ratio. Further, this paper establishes that using recursive $k$-means divisive clustering has a very poor lower bound on its approximation ratio, perhaps explaining why is it not as popular in practice. Motivated by the poor performance of $k$-means, we seek to find divisive algorithms that do perform well theoretically and this paper gives two constant approximation algorithms. This paper represents some of the first work giving a foundation for hierarchical clustering algorithms used in practice.