Timezone: »
Clustering is a basic data mining task with a wide variety of applications. Not surprisingly, there exist many clustering algorithms. However, clustering is an ill defined problem - given a data set, it is not clear what a “correct” clustering for that set is. Indeed, different algorithms may yield dramatically different outputs for the same input sets. Faced with a concrete clustering task, a user needs to choose an appropriate clustering algorithm. Currently, such decisions are often made in a very ad hoc, if not completely random, manner. Given the crucial effect of the choice of a clustering algorithm on the resulting clustering, this state of affairs is truly regrettable. In this paper we address the major research challenge of developing tools for helping users make more informed decisions when they come to pick a clustering tool for their data. This is, of course, a very ambitious endeavor, and in this paper, we make some first steps towards this goal. We propose to address this problem by distilling abstract properties of the input-output behavior of different clustering paradigms.
In this paper, we demonstrate how abstract, intuitive properties of clustering functions can be used to taxonomize a set of popular clustering algorithmic paradigms. On top of addressing deterministic clustering algorithms, we also propose similar properties for randomized algorithms and use them to highlight functional differences between different common implementations of k-means clustering. We also study relationships between the properties, independent of any particular algorithm. In particular, we strengthen Kleinbergs famous impossibility result, while providing a simpler proof.
Author Information
Margareta Ackerman (Florida State University)
Shai Ben-David (University of Waterloo)
David R Loker (University of Waterloo)
More from the Same Authors
-
2016 Poster: Clustering with Same-Cluster Queries »
Hassan Ashtiani · Shrinu Kushagra · Shai Ben-David -
2016 Oral: Clustering with Same-Cluster Queries »
Hassan Ashtiani · Shrinu Kushagra · Shai Ben-David -
2015 : Domain Adaptation for Binary Classification »
Shai Ben-David -
2015 : Discussion Panel »
Tim van Erven · Wouter Koolen · Peter Grünwald · Shai Ben-David · Dylan Foster · Satyen Kale · Gergely Neu -
2015 : Clustering Is Easy When... »
Shai Ben-David -
2014 Poster: Incremental Clustering: The Case for Extra Clusters »
Margareta Ackerman · Sanjoy Dasgupta -
2013 Workshop: New Directions in Transfer and Multi-Task: Learning Across Domains and Tasks »
Urun Dogan · Marius Kloft · Tatiana Tommasi · Francesco Orabona · Massimiliano Pontil · Sinno Jialin Pan · Shai Ben-David · Arthur Gretton · Fei Sha · Marco Signoretto · Rajhans Samdani · Yun-Qian Miao · Mohammad Gheshlaghi azar · Ruth Urner · Christoph Lampert · Jonathan How -
2009 Workshop: Clustering: Science or art? Towards principled approaches »
Margareta Ackerman · Shai Ben-David · Avrim Blum · Isabelle Guyon · Ulrike von Luxburg · Robert Williamson · Reza Zadeh -
2008 Workshop: New Challanges in Theoretical Machine Learning: Data Dependent Concept Spaces »
Maria-Florina F Balcan · Shai Ben-David · Avrim Blum · Kristiaan Pelckmans · John Shawe-Taylor -
2008 Poster: Measures of Clustering Quality: A Working Set of Axioms for Clustering »
Shai Ben-David · Margareta Ackerman -
2008 Oral: Measures of Clustering Quality: A Working Set of Axioms for Clustering »
Shai Ben-David · Margareta Ackerman -
2006 Poster: Analysis of Representations for Domain Adaptation »
John Blitzer · Shai Ben-David · Yacov Crammer · Fernando Pereira