Timezone: »
A fundamental procedure in the analysis of massive datasets is the construction of similarity graphs. Such graphs play a key role for many downstream tasks, including clustering, classification, graph learning, and nearest neighbor search. For these tasks, it is critical to build graphs which are sparse yet still representative of the underlying data. The benefits of sparsity are twofold: firstly, constructing dense graphs is infeasible in practice for large datasets, and secondly, the runtime of downstream tasks is directly influenced by the sparsity of the similarity graph. In this work, we present Stars: a highly scalable method for building extremely sparse graphs via two-hop spanners, which are graphs where similar points are connected by a path of length at most two. Stars can construct two-hop spanners with significantly fewer similarity comparisons, which are a major bottleneck for learning based models where comparisons are expensive to evaluate. Theoretically, we demonstrate that Stars builds a graph in nearly-linear time, where approximate nearest neighbors are contained within two-hop neighborhoods. In practice, we have deployed Stars for multiple data sets allowing for graph building at the Tera-Scale, i.e., for graphs with hundreds of billions of nodes and tens of trillions of edges. We evaluate the performance of Stars for clustering and graph learning, and demonstrate 10~1000-fold improvements in pairwise similarity comparisons and significant running time speedups with negligible quality loss.
Author Information
CJ Carey (Google)
Jonathan Halcrow (Google)
Rajesh Jayaram (Google)
Vahab Mirrokni (Google Research)
Warren Schudy (Google)
Peilin Zhong (Google)
More from the Same Authors
-
2022 Poster: Posted Pricing and Dynamic Prior-independent Mechanisms with Value Maximizers »
Yuan Deng · Vahab Mirrokni · Hanrui Zhang -
2022 : Differentially Private Graph Learning via Sensitivity-Bounded Personalized PageRank »
Alessandro Epasto · Vahab Mirrokni · Bryan Perozzi · Anton Tsitsulin · Peilin Zhong -
2022 Poster: Differentially Private Graph Learning via Sensitivity-Bounded Personalized PageRank »
Alessandro Epasto · Vahab Mirrokni · Bryan Perozzi · Anton Tsitsulin · Peilin Zhong -
2022 Poster: Near-Optimal Private and Scalable $k$-Clustering »
Vincent Cohen-Addad · Alessandro Epasto · Vahab Mirrokni · Shyam Narayanan · Peilin Zhong -
2022 Poster: Anonymous Bandits for Multi-User Systems »
Hossein Esfandiari · Vahab Mirrokni · Jon Schneider -
2022 Poster: Hierarchical Agglomerative Graph Clustering in Poly-Logarithmic Depth »
Laxman Dhulipala · David Eisenstat · Jakub Lacki · Vahab Mirrokni · Jessica Shi -
2022 Poster: Cluster Randomized Designs for One-Sided Bipartite Experiments »
Jennifer Brennan · Vahab Mirrokni · Jean Pouget-Abadie -
2021 Poster: Practical Large-Scale Linear Programming using Primal-Dual Hybrid Gradient »
David Applegate · Mateo Diaz · Oliver Hinder · Haihao Lu · Miles Lubin · Brendan O'Donoghue · Warren Schudy -
2021 Poster: Robust Auction Design in the Auto-bidding World »
Santiago Balseiro · Yuan Deng · Jieming Mao · Vahab Mirrokni · Song Zuo -
2021 Poster: Synthetic Design: An Optimization Approach to Experimental Design with Synthetic Controls »
Nick Doudchenko · Khashayar Khosravi · Jean Pouget-Abadie · Sébastien Lahaie · Miles Lubin · Vahab Mirrokni · Jann Spiess · guido imbens -
2021 Poster: Parallelizing Thompson Sampling »
Amin Karbasi · Vahab Mirrokni · Mohammad Shadravan -
2019 Poster: Variance Reduction in Bipartite Experiments through Correlation Clustering »
Jean Pouget-Abadie · Kevin Aydin · Warren Schudy · Kay Brodersen · Vahab Mirrokni