Timezone: »
Algorithms for clustering points in metric spaces is a long-studied area of research. Clustering has seen a multitude of work both theoretically, in understanding the approximation guarantees possible for many objective functions such as k-median and k-means clustering, and experimentally, in finding the fastest algorithms and seeding procedures for Lloyd's algorithm. The performance of a given clustering algorithm depends on the specific application at hand, and this may not be known up front. For example, a "typical instance" may vary depending on the application, and different clustering heuristics perform differently depending on the instance.
In this paper, we define an infinite family of algorithms generalizing Lloyd's algorithm, with one parameter controlling the the initialization procedure, and another parameter controlling the local search procedure. This family of algorithms includes the celebrated k-means++ algorithm, as well as the classic farthest-first traversal algorithm. We design efficient learning algorithms which receive samples from an application-specific distribution over clustering instances and learn a near-optimal clustering algorithm from the class. We show the best parameters vary significantly across datasets such as MNIST, CIFAR, and mixtures of Gaussians. Our learned algorithms never perform worse than k-means++, and on some datasets we see significant improvements.
Author Information
Maria-Florina Balcan (Carnegie Mellon University)
Travis Dick (Carnegie Mellon University)
Colin White (Carnegie Mellon University)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Spotlight: Data-Driven Clustering via Parameterized Lloyd's Families »
Tue. Dec 4th 09:50 -- 09:55 PM Room Room 517 CD
More from the Same Authors
-
2021 Spotlight: Sample Complexity of Tree Search Configuration: Cutting Planes and Beyond »
Maria-Florina Balcan · Siddharth Prasad · Tuomas Sandholm · Ellen Vitercik -
2021 : Synthetic Benchmarks for Scientific Research in Explainable Machine Learning »
Yang Liu · Sujay Khandagale · Colin White · Willie Neiswanger -
2022 : AutoML for Climate Change: A Call to Action »
Renbo Tu · Nicholas Roberts · Vishak Prasad C · Sibasis Nayak · Paarth Jain · Frederic Sala · Ganesh Ramakrishnan · Ameet Talwalkar · Willie Neiswanger · Colin White -
2022 : On the Importance of Architectures and Hyperparameters for Fairness in Face Recognition »
Samuel Dooley · Rhea Sukthanker · John Dickerson · Colin White · Frank Hutter · Micah Goldblum -
2022 : On the Importance of Architectures and Hyperparameters for Fairness in Face Recognition »
Samuel Dooley · Rhea Sukthanker · John Dickerson · Colin White · Frank Hutter · Micah Goldblum -
2022 Poster: Structural Analysis of Branch-and-Cut and the Learnability of Gomory Mixed Integer Cuts »
Maria-Florina Balcan · Siddharth Prasad · Tuomas Sandholm · Ellen Vitercik -
2022 Poster: Provably tuning the ElasticNet across instances »
Maria-Florina Balcan · Misha Khodak · Dravyansh Sharma · Ameet Talwalkar -
2022 Poster: Maximizing Revenue under Market Shrinkage and Market Uncertainty »
Maria-Florina Balcan · Siddharth Prasad · Tuomas Sandholm -
2022 Poster: On the Generalizability and Predictability of Recommender Systems »
Duncan McElfresh · Sujay Khandagale · Jonathan Valverde · John Dickerson · Colin White -
2022 Poster: NAS-Bench-Suite-Zero: Accelerating Research on Zero Cost Proxies »
Arjun Krishnakumar · Colin White · Arber Zela · Renbo Tu · Mahmoud Safari · Frank Hutter -
2022 Poster: Learning Predictions for Algorithms with Predictions »
Misha Khodak · Maria-Florina Balcan · Ameet Talwalkar · Sergei Vassilvitskii -
2021 Poster: How Powerful are Performance Predictors in Neural Architecture Search? »
Colin White · Arber Zela · Robin Ru · Yang Liu · Frank Hutter -
2021 Poster: Data driven semi-supervised learning »
Maria-Florina Balcan · Dravyansh Sharma -
2021 Poster: Federated Hyperparameter Tuning: Challenges, Baselines, and Connections to Weight-Sharing »
Mikhail Khodak · Renbo Tu · Tian Li · Liam Li · Maria-Florina Balcan · Virginia Smith · Ameet Talwalkar -
2021 Poster: NAS-Bench-x11 and the Power of Learning Curves »
Shen Yan · Colin White · Yash Savani · Frank Hutter -
2021 Poster: Sample Complexity of Tree Search Configuration: Cutting Planes and Beyond »
Maria-Florina Balcan · Siddharth Prasad · Tuomas Sandholm · Ellen Vitercik -
2021 Poster: Learning-to-learn non-convex piecewise-Lipschitz functions »
Maria-Florina Balcan · Mikhail Khodak · Dravyansh Sharma · Ameet Talwalkar -
2021 Oral: Data driven semi-supervised learning »
Maria-Florina Balcan · Dravyansh Sharma -
2020 Poster: A Study on Encodings for Neural Architecture Search »
Colin White · Willie Neiswanger · Sam Nolen · Yash Savani -
2020 Spotlight: A Study on Encodings for Neural Architecture Search »
Colin White · Willie Neiswanger · Sam Nolen · Yash Savani -
2020 Poster: Intra-Processing Methods for Debiasing Neural Networks »
Yash Savani · Colin White · Naveen Sundar Govindarajulu -
2019 : Coffee/Poster session 1 »
Shiro Takagi · Khurram Javed · Johanna Sommer · Amr Sharaf · Pierluca D'Oro · Ying Wei · Sivan Doveh · Colin White · Santiago Gonzalez · Cuong Nguyen · Mao Li · Tianhe Yu · Tiago Ramalho · Masahiro Nomura · Ahsan Alvi · Jean-Francois Ton · W. Ronny Huang · Jessica Lee · Sebastian Flennerhag · Michael Zhang · Abram Friesen · Paul Blomstedt · Alina Dubatovka · Sergey Bartunov · Subin Yi · Iaroslav Shcherbatyi · Christian Simon · Zeyuan Shang · David MacLeod · Lu Liu · Liam Fowl · Diego Mesquita · Deirdre Quillen -
2019 Poster: Envy-Free Classification »
Maria-Florina Balcan · Travis Dick · Ritesh Noothigattu · Ariel Procaccia -
2019 Poster: Adaptive Gradient-Based Meta-Learning Methods »
Misha Khodak · Maria-Florina Balcan · Ameet Talwalkar -
2018 : Poster Session »
Phillipp Schoppmann · Patrick Yu · Valerie Chen · Travis Dick · Marc Joye · Ningshan Zhang · Frederik Harder · Olli Saarikivi · Théo Ryffel · Yunhui Long · Théo JOURDAN · Di Wang · Antonio Marcedone · Negev Shekel Nosatzki · Yatharth A Dubey · Antti Koskela · Peter Bloem · Aleksandra Korolova · Martin Bertran · Hao Chen · Galen Andrew · Natalia Martinez · Janardhan Kulkarni · Jonathan Passerat-Palmbach · Guillermo Sapiro · Amrita Roy Chowdhury -
2018 : Posters 1 »
Wei Wei · Flavio Calmon · Travis Dick · Leilani Gilpin · Maroussia Lévesque · Malek Ben Salem · Michael Wang · Jack Fitzsimons · Dimitri Semenovich · Linda Gu · Nathaniel Fruchter -
2017 : Invited Talk: Sample and Computationally Efficient Active Learning Algorithms »
Maria-Florina Balcan -
2017 Poster: Sample and Computationally Efficient Learning Algorithms under S-Concave Distributions »
Maria-Florina Balcan · Hongyang Zhang -
2016 Poster: Noise-Tolerant Life-Long Matrix Completion via Adaptive Sampling »
Maria-Florina Balcan · Hongyang Zhang -
2016 Poster: Sample Complexity of Automated Mechanism Design »
Maria-Florina Balcan · Tuomas Sandholm · Ellen Vitercik