Timezone: »

Supervised Clustering
Pranjal Awasthi · Reza Zadeh

Wed Dec 08 03:15 PM -- 03:20 PM (PST) @ Regency Ballroom

Despite the ubiquity of clustering as a tool in unsupervised learning, there is not yet
a consensus on a formal theory, and the vast majority
of work in this direction has focused on unsupervised clustering.
We study a recently proposed framework for supervised clustering
where there is access to a teacher.
We give an improved generic algorithm to cluster any concept class
in that model. Our algorithm is query-efficient in the sense that it involves only a small amount
of interaction with the teacher. We also present and study two natural generalizations of the model.
The model assumes that the teacher response to the algorithm is perfect. We eliminate
this limitation by proposing a noisy model and give an algorithm for
clustering the class of intervals in this noisy model. We also propose a dynamic model where the teacher sees
a random subset of the points. Finally, for datasets
satisfying a spectrum of weak to strong properties, we give query bounds, and show that a class
of clustering functions containing Single-Linkage will find the target clustering under the strongest

Author Information

Pranjal Awasthi (Princeton)
Reza Zadeh (Matroid and Stanford)

Reza Bosagh Zadeh is Founder CEO at Matroid and an Adjunct Professor at Stanford University. His work focuses on Machine Learning, Distributed Computing, and Discrete Applied Mathematics. Reza received his PhD in Computational Mathematics from Stanford under the supervision of Gunnar Carlsson. His awards include a KDD Best Paper Award and the Gene Golub Outstanding Thesis Award. He has served on the Technical Advisory Boards of Microsoft and Databricks. As part of his research, Reza built the Machine Learning Algorithms behind Twitter's who-to-follow system, the first product to use Machine Learning at Twitter. Reza is the initial creator of the Linear Algebra Package in Apache Spark. Through Apache Spark, Reza's work has been incorporated into industrial and academic cluster computing environments. In addition to research, Reza designed and teaches two PhD-level classes at Stanford: Distributed Algorithms and Optimization (CME 323), and Discrete Mathematics and Algorithms (CME 305).

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors