Timezone: »
Poster
Affinity Clustering: Hierarchical Clustering at Scale
Mohammadhossein Bateni · Soheil Behnezhad · Mahsa Derakhshan · MohammadTaghi Hajiaghayi · Raimondas Kiveris · Silvio Lattanzi · Vahab Mirrokni
Graph clustering is a fundamental task in many data-mining and machine-learning pipelines. In particular, identifying a good hierarchical structure is at the same time a fundamental and challenging problem for several applications. The amount of data to analyze is increasing at an astonishing rate each day. Hence there is a need for new solutions to efficiently compute effective hierarchical clusterings on such huge data. The main focus of this paper is on minimum spanning tree (MST) based clusterings. In particular, we propose affinity, a novel hierarchical clustering based on Boruvka's MST algorithm. We prove certain theoretical guarantees for affinity (as well as some other classic algorithms) and show that in practice it is superior to several other state-of-the-art clustering algorithms. Furthermore, we present two MapReduce implementations for affinity. The first one works for the case where the input graph is dense and takes constant rounds. It is based on a Massively Parallel MST algorithm for dense graphs that improves upon the state-of-the-art algorithm of Lattanzi et al. (SPAA 2011). Our second algorithm has no assumption on the density of the input graph and finds the affinity clustering in $O(\log n)$ rounds using Distributed Hash Tables (DHTs). We show experimentally that our algorithms are scalable for huge data sets, e.g., for graphs with trillions of edges.
Author Information
Mohammadhossein Bateni (Google research)
Soheil Behnezhad (University of Maryland)
Mahsa Derakhshan (University of Maryland)
MohammadTaghi Hajiaghayi (University of Maryland)
Raimondas Kiveris (Google research)
Silvio Lattanzi (Google Research)
Vahab Mirrokni (Google Research NYC)
More from the Same Authors
-
2022 : Scalable and Improved Algorithms for Individually Fair Clustering »
Mohammadhossein Bateni · Vincent Cohen-Addad · Alessandro Epasto · Silvio Lattanzi -
2023 Poster: Multi-Swap k-Means++ »
Lorenzo Beretta · Vincent Cohen-Addad · Silvio Lattanzi · Nikos Parotsidis -
2023 Poster: Fully Dynamic $k$-Clustering in $\tilde O(k)$ Update Time »
Sayan Bhattacharya · Martin Costa · Silvio Lattanzi · Nikos Parotsidis -
2023 Poster: Fair, Polylog-Approximate Low-Cost Hierarchical Clustering »
Marina Knittel · Max Springer · John Dickerson · MohammadTaghi Hajiaghayi -
2023 Poster: Bandit Social Learning under Myopic Behavior »
Kiarash Banihashem · MohammadTaghi Hajiaghayi · Suho Shin · Aleksandrs Slivkins -
2023 Poster: An Improved Relaxation for Oracle-Efficient Adversarial Contextual Bandits »
Kiarash Banihashem · MohammadTaghi Hajiaghayi · Suho Shin · Max Springer -
2023 Poster: Dynamic Non-monotone Submodular Maximization »
Kiarash Banihashem · Leyla Biabani · Samira Goudarzi · MohammadTaghi Hajiaghayi · Peyman Jabbarzade · Morteza Monemizadeh -
2022 Poster: Active Learning of Classifiers with Label and Seed Queries »
Marco Bressan · Nicolò Cesa-Bianchi · Silvio Lattanzi · Andrea Paudice · Maximilian Thiessen -
2022 Poster: Near-Optimal Correlation Clustering with Privacy »
Vincent Cohen-Addad · Chenglin Fan · Silvio Lattanzi · Slobodan Mitrovic · Ashkan Norouzi-Fard · Nikos Parotsidis · Jakub Tarnawski -
2022 Poster: Online Algorithms for the Santa Claus Problem »
Max Springer · MohammadTaghi Hajiaghayi · Debmalya Panigrahi · Mohammad Khani -
2022 Poster: Efficient and Stable Fully Dynamic Facility Location »
Sayan Bhattacharya · Silvio Lattanzi · Nikos Parotsidis -
2021 Poster: Online Facility Location with Multiple Advice »
Matteo Almanza · Flavio Chierichetti · Silvio Lattanzi · Alessandro Panconesi · Giuseppe Re -
2021 Poster: Robust Online Correlation Clustering »
Silvio Lattanzi · Benjamin Moseley · Sergei Vassilvitskii · Yuyan Wang · Rudy Zhou -
2021 Poster: Parallel and Efficient Hierarchical k-Median Clustering »
Vincent Cohen-Addad · Silvio Lattanzi · Ashkan Norouzi-Fard · Christian Sohler · Ola Svensson -
2021 Poster: Efficient and Local Parallel Random Walks »
Michael Kapralov · Silvio Lattanzi · Navid Nouri · Jakab Tardos -
2021 Poster: Streaming Belief Propagation for Community Detection »
Yuchen Wu · Jakab Tardos · Mohammadhossein Bateni · André Linhares · Filipe Miguel Goncalves de Almeida · Andrea Montanari · Ashkan Norouzi-Fard -
2021 Poster: On Margin-Based Cluster Recovery with Oracle Queries »
Marco Bressan · Nicolò Cesa-Bianchi · Silvio Lattanzi · Andrea Paudice -
2020 Poster: Fully Dynamic Algorithm for Constrained Submodular Optimization »
Silvio Lattanzi · Slobodan Mitrović · Ashkan Norouzi-Fard · Jakub Tarnawski · Morteza Zadimoghaddam -
2020 Oral: Fully Dynamic Algorithm for Constrained Submodular Optimization »
Silvio Lattanzi · Slobodan Mitrović · Ashkan Norouzi-Fard · Jakub Tarnawski · Morteza Zadimoghaddam -
2020 Poster: Optimal Approximation - Smoothness Tradeoffs for Soft-Max Functions »
Alessandro Epasto · Mohammad Mahdian · Vahab Mirrokni · Emmanouil Zampetakis -
2020 Spotlight: Optimal Approximation - Smoothness Tradeoffs for Soft-Max Functions »
Alessandro Epasto · Mohammad Mahdian · Vahab Mirrokni · Emmanouil Zampetakis -
2020 Poster: Sliding Window Algorithms for k-Clustering Problems »
Michele Borassi · Alessandro Epasto · Silvio Lattanzi · Sergei Vassilvitskii · Morteza Zadimoghaddam -
2020 Poster: Fast and Accurate $k$-means++ via Rejection Sampling »
Vincent Cohen-Addad · Silvio Lattanzi · Ashkan Norouzi-Fard · Christian Sohler · Ola Svensson -
2020 Poster: Online MAP Inference of Determinantal Point Processes »
Aditya Bhaskara · Amin Karbasi · Silvio Lattanzi · Morteza Zadimoghaddam -
2020 Poster: Exact Recovery of Mangled Clusters with Same-Cluster Queries »
Marco Bressan · Nicolò Cesa-Bianchi · Silvio Lattanzi · Andrea Paudice -
2020 Poster: Smoothly Bounding User Contributions in Differential Privacy »
Alessandro Epasto · Mohammad Mahdian · Jieming Mao · Vahab Mirrokni · Lijie Ren -
2020 Poster: Contextual Reserve Price Optimization in Auctions via Mixed Integer Programming »
Joey Huchette · Haihao Lu · Hossein Esfandiari · Vahab Mirrokni -
2020 Oral: Exact Recovery of Mangled Clusters with Same-Cluster Queries »
Marco Bressan · Nicolò Cesa-Bianchi · Silvio Lattanzi · Andrea Paudice -
2020 Session: Orals & Spotlights Track 05: Clustering/Ranking »
Silvio Lattanzi · Katerina Fragkiadaki -
2020 : Clustering At Scale »
Vahab Mirrokni -
2020 Expo Workshop: Mining and Learning with Graphs at Scale »
Vahab Mirrokni · Bryan Perozzi · Jakub Lacki · Jonathan Halcrow · Jaqui C Herman -
2020 : Introduction »
Vahab Mirrokni -
2019 : Coffee Break & Poster Session 1 »
Yan Zhang · Jonathon Hare · Adam Prugel-Bennett · Po Leung · Patrick Flaherty · Pitchaya Wiratchotisatian · Alessandro Epasto · Silvio Lattanzi · Sergei Vassilvitskii · Morteza Zadimoghaddam · Theja Tulabandhula · Fabian Fuchs · Adam Kosiorek · Ingmar Posner · William Hang · Anna Goldie · Sujith Ravi · Azalia Mirhoseini · Yuwen Xiong · Mengye Ren · Renjie Liao · Raquel Urtasun · Haici Zhang · Michele Borassi · Shengda Luo · Andrew Trapp · Geoffroy Dubourg-Felonneau · Yasmeen Kussad · Christopher Bender · Manzil Zaheer · Junier Oliva · Michał Stypułkowski · Maciej Zieba · Austin Dill · Chun-Liang Li · Songwei Ge · Eunsu Kang · Oiwi Parker Jones · Kelvin Ka Wing Wong · Joshua Payne · Yang Li · Azade Nazi · Erkut Erdem · Aykut Erdem · Kevin O'Connor · Juan J Garcia · Maciej Zamorski · Jan Chorowski · Deeksha Sinha · Harry Clifford · John W Cassidy -
2019 Poster: Contextual Bandits with Cross-Learning »
Santiago Balseiro · Negin Golrezaei · Mohammad Mahdian · Vahab Mirrokni · Jon Schneider -
2019 Poster: Dynamic Incentive-Aware Learning: Robust Pricing in Contextual Auctions »
Negin Golrezaei · Adel Javanmard · Vahab Mirrokni -
2019 Poster: A Robust Non-Clairvoyant Dynamic Mechanism for Contextual Auctions »
Yuan Deng · Sébastien Lahaie · Vahab Mirrokni -
2019 Poster: Locality-Sensitive Hashing for f-Divergences: Mutual Information Loss and Beyond »
Lin Chen · Hossein Esfandiari · Gang Fu · Vahab Mirrokni -
2019 Poster: Variance Reduction in Bipartite Experiments through Correlation Clustering »
Jean Pouget-Abadie · Kevin Aydin · Warren Schudy · Kay Brodersen · Vahab Mirrokni -
2018 Poster: Mallows Models for Top-k Lists »
Flavio Chierichetti · Anirban Dasgupta · Shahrzad Haddadan · Ravi Kumar · Silvio Lattanzi -
2017 Poster: Dynamic Revenue Sharing »
Santiago Balseiro · Max Lin · Vahab Mirrokni · Renato Leme · IIIS Song Zuo -
2016 Poster: Bi-Objective Online Matching and Submodular Allocations »
Hossein Esfandiari · Nitish Korula · Vahab Mirrokni -
2016 Poster: Community Detection on Evolving Graphs »
Stefano Leonardi · Aris Anagnostopoulos · Jakub Łącki · Silvio Lattanzi · Mohammad Mahdian -
2016 Poster: Linear Relaxations for Finding Diverse Elements in Metric Spaces »
Aditya Bhaskara · Mehrdad Ghadiri · Vahab Mirrokni · Ola Svensson -
2014 Poster: Distributed Balanced Clustering via Mapping Coresets »
Mohammadhossein Bateni · Aditya Bhaskara · Silvio Lattanzi · Vahab Mirrokni