Timezone: »
Oral
Clustering with Same-Cluster Queries
Hassan Ashtiani · Shrinu Kushagra · Shai Ben-David
We propose a framework for Semi-Supervised Active Clustering framework (SSAC), where the learner is allowed to interact with a domain expert, asking whether two given instances belong to the same cluster or not. We study the query and computational complexity of clustering in this framework. We consider a setting where the expert conforms to a center-based clustering with a notion of margin. We show that there is a trade off between computational complexity and query complexity; We prove that for the case of $k$-means clustering (i.e., when the expert conforms to a solution of $k$-means), having access to relatively few such queries allows efficient solutions to otherwise NP hard problems. In particular, we provide a probabilistic polynomial-time (BPP) algorithm for clustering in this setting that asks $O\big(k^2\log k + k\log n)$ same-cluster queries and runs with time complexity $O\big(kn\log n)$ (where $k$ is the number of clusters and $n$ is the number of instances). The success of the algorithm is guaranteed for data satisfying the margin condition under which, without queries, we show that the problem is NP hard. We also prove a lower bound on the number of queries needed to have a computationally efficient clustering algorithm in this setting.
Author Information
Hassan Ashtiani (University of Waterloo)
Shrinu Kushagra (University of Waterloo)
Shai Ben-David (U. Waterloo)
More from the Same Authors
-
2016 Poster: Clustering with Same-Cluster Queries »
Hassan Ashtiani · Shrinu Kushagra · Shai Ben-David -
2015 : Domain Adaptation for Binary Classification »
Shai Ben-David -
2015 : Discussion Panel »
Tim van Erven · Wouter Koolen · Peter Grünwald · Shai Ben-David · Dylan Foster · Satyen Kale · Gergely Neu -
2015 : Clustering Is Easy When... »
Shai Ben-David -
2013 Workshop: New Directions in Transfer and Multi-Task: Learning Across Domains and Tasks »
Urun Dogan · Marius Kloft · Tatiana Tommasi · Francesco Orabona · Massimiliano Pontil · Sinno Jialin Pan · Shai Ben-David · Arthur Gretton · Fei Sha · Marco Signoretto · Rajhans Samdani · Yun-Qian Miao · Mohammad Gheshlaghi azar · Ruth Urner · Christoph Lampert · Jonathan How -
2010 Poster: Towards Property-Based Classification of Clustering Paradigms »
Margareta Ackerman · Shai Ben-David · David R Loker -
2009 Workshop: Clustering: Science or art? Towards principled approaches »
Margareta Ackerman · Shai Ben-David · Avrim Blum · Isabelle Guyon · Ulrike von Luxburg · Robert Williamson · Reza Zadeh -
2008 Workshop: New Challanges in Theoretical Machine Learning: Data Dependent Concept Spaces »
Maria-Florina F Balcan · Shai Ben-David · Avrim Blum · Kristiaan Pelckmans · John Shawe-Taylor -
2008 Poster: Measures of Clustering Quality: A Working Set of Axioms for Clustering »
Shai Ben-David · Margareta Ackerman -
2008 Oral: Measures of Clustering Quality: A Working Set of Axioms for Clustering »
Shai Ben-David · Margareta Ackerman -
2006 Poster: Analysis of Representations for Domain Adaptation »
John Blitzer · Shai Ben-David · Yacov Crammer · Fernando Pereira