Timezone: »
Oral
Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search
Benjamin Moseley · Joshua Wang
Hierarchical clustering is a data analysis method that has been used for decades. Despite its widespread use, there is a lack of an analytical foundation for the method. Having such a foundation would both support the methods currently used and guide future improvements. This paper gives an applied algorithmic foundation for hierarchical clustering. The goal of this paper is to give an analytic framework supporting observations seen in practice. This paper considers the dual of a problem framework for hierarchical clustering introduced by Dasgupta. The main results are that one of the most popular algorithms used in practice, average-linkage agglomerative clustering, has a small constant approximation ratio. Further, this paper establishes that using recursive $k$-means divisive clustering has a very poor lower bound on its approximation ratio, perhaps explaining why is it not as popular in practice. Motivated by the poor performance of $k$-means, we seek to find divisive algorithms that do perform well theoretically and this paper gives two constant approximation algorithms. This paper represents some of the first work giving a foundation for hierarchical clustering algorithms used in practice.
Author Information
Benjamin Moseley (Carnegie Mellon University)
Joshua Wang (Google)
Related Events (a corresponding poster, oral, or spotlight)
-
2017 Poster: Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search »
Wed. Dec 6th 02:30 -- 06:30 AM Room Pacific Ballroom #30
More from the Same Authors
-
2023 Poster: Online List Labeling with Predictions »
Samuel McCauley · Benjamin Moseley · Aidin Niaparast · Shikha Singh -
2022 Poster: Algorithms with Prediction Portfolios »
Michael Dinitz · Sungjin Im · Thomas Lavastida · Benjamin Moseley · Sergei Vassilvitskii -
2021 : AI workloads inside databases »
Guy Van den Broeck · Alexander Ratner · Benjamin Moseley · Konstantinos Karanasos · Parisa Kordjamshidi · Molham Aref · Arun Kumar -
2021 Poster: Robust Online Correlation Clustering »
Silvio Lattanzi · Benjamin Moseley · Sergei Vassilvitskii · Yuyan Wang · Rudy Zhou -
2021 Poster: Margin-Independent Online Multiclass Learning via Convex Geometry »
Guru Guruganesh · Allen Liu · Jon Schneider · Joshua Wang -
2021 Oral: Faster Matchings via Learned Duals »
Michael Dinitz · Sungjin Im · Thomas Lavastida · Benjamin Moseley · Sergei Vassilvitskii -
2021 Poster: Faster Matchings via Learned Duals »
Michael Dinitz · Sungjin Im · Thomas Lavastida · Benjamin Moseley · Sergei Vassilvitskii -
2020 Poster: Fair Hierarchical Clustering »
Sara Ahmadian · Alessandro Epasto · Marina Knittel · Ravi Kumar · Mohammad Mahdian · Benjamin Moseley · Philip Pham · Sergei Vassilvitskii · Yuyan Wang -
2019 Poster: Efficient Rematerialization for Deep Networks »
Ravi Kumar · Manish Purohit · Zoya Svitkina · Erik Vee · Joshua Wang -
2019 Poster: Backprop with Approximate Activations for Memory-efficient Network Training »
Ayan Chakrabarti · Benjamin Moseley -
2019 Poster: Cost Effective Active Search »
Shali Jiang · Roman Garnett · Benjamin Moseley -
2018 Poster: Efficient nonmyopic batch active search »
Shali Jiang · Gustavo Malkomes · Matthew Abbott · Benjamin Moseley · Roman Garnett -
2018 Spotlight: Efficient nonmyopic batch active search »
Shali Jiang · Gustavo Malkomes · Matthew Abbott · Benjamin Moseley · Roman Garnett -
2018 Poster: Optimal Algorithms for Continuous Non-monotone Submodular and DR-Submodular Maximization »
Rad Niazadeh · Tim Roughgarden · Joshua Wang -
2018 Oral: Optimal Algorithms for Continuous Non-monotone Submodular and DR-Submodular Maximization »
Rad Niazadeh · Tim Roughgarden · Joshua Wang