Timezone: »
Despite the ubiquity of clustering as a tool in unsupervised learning, there is not yet
a consensus on a formal theory, and the vast majority
of work in this direction has focused on unsupervised clustering.
We study a recently proposed framework for supervised clustering
where there is access to a teacher.
We give an improved generic algorithm to cluster any concept class
in that model. Our algorithm is query-efficient in the sense that it involves only a small amount
of interaction with the teacher. We also present and study two natural generalizations of the model.
The model assumes that the teacher response to the algorithm is perfect. We eliminate
this limitation by proposing a noisy model and give an algorithm for
clustering the class of intervals in this noisy model. We also propose a dynamic model where the teacher sees
a random subset of the points. Finally, for datasets
satisfying a spectrum of weak to strong properties, we give query bounds, and show that a class
of clustering functions containing Single-Linkage will find the target clustering under the strongest
property.
Author Information
Pranjal Awasthi (Princeton)
Reza Zadeh (Matroid and Stanford)
Reza Bosagh Zadeh is Founder CEO at Matroid and an Adjunct Professor at Stanford University. His work focuses on Machine Learning, Distributed Computing, and Discrete Applied Mathematics. Reza received his PhD in Computational Mathematics from Stanford under the supervision of Gunnar Carlsson. His awards include a KDD Best Paper Award and the Gene Golub Outstanding Thesis Award. He has served on the Technical Advisory Boards of Microsoft and Databricks. As part of his research, Reza built the Machine Learning Algorithms behind Twitter's who-to-follow system, the first product to use Machine Learning at Twitter. Reza is the initial creator of the Linear Algebra Package in Apache Spark. Through Apache Spark, Reza's work has been incorporated into industrial and academic cluster computing environments. In addition to research, Reza designed and teaches two PhD-level classes at Stanford: Distributed Algorithms and Optimization (CME 323), and Discrete Mathematics and Algorithms (CME 305).
Related Events (a corresponding poster, oral, or spotlight)
-
2010 Poster: Supervised Clustering »
Wed. Dec 8th 08:00 -- 08:00 AM Room
More from the Same Authors
-
2015 Poster: On some provably correct cases of variational inference for topic models »
Pranjal Awasthi · Andrej Risteski -
2015 Spotlight: On some provably correct cases of variational inference for topic models »
Pranjal Awasthi · Andrej Risteski -
2014 Workshop: Distributed Machine Learning and Matrix Computations »
Reza Zadeh · Ion Stoica · Ameet S Talwalkar -
2014 Poster: Learning Mixtures of Ranking Models »
Pranjal Awasthi · Avrim Blum · Or Sheffet · Aravindan Vijayaraghavan -
2014 Spotlight: Learning Mixtures of Ranking Models »
Pranjal Awasthi · Avrim Blum · Or Sheffet · Aravindan Vijayaraghavan -
2013 Workshop: Large Scale Matrix Analysis and Inference »
Reza Zadeh · Gunnar Carlsson · Michael Mahoney · Manfred K. Warmuth · Wouter M Koolen · Nati Srebro · Satyen Kale · Malik Magdon-Ismail · Ashish Goel · Matei A Zaharia · David Woodruff · Ioannis Koutis · Benjamin Recht -
2009 Workshop: Clustering: Science or art? Towards principled approaches »
Margareta Ackerman · Shai Ben-David · Avrim Blum · Isabelle Guyon · Ulrike von Luxburg · Robert Williamson · Reza Zadeh