Skip to yearly menu bar Skip to main content


Talk
in
Competition: Competition Track Day 2: Overviews + Breakout Sessions

Billion-Scale Approximate Nearest Neighbor Search Challenge + Q&A

Harsha Vardhan Simhadri · George Williams · Martin Aumüller · Artem Babenko · Dmitry Baranchuk · Qi Chen · Matthijs Douze · Ravishankar Krishnawamy · Gopal Srinivasa · Suhas Jayaram Subramanya · Jingdong Wang


Abstract:

Approximate Nearest Neighbor Search (ANNS) amounts to finding nearby points to a given query point in a high-dimensional vector space. ANNS algorithms optimize a tradeoff between search speed, memory usage and accuracy with respect to an exact sequential search. Thanks to efforts like ann-benchmarks.com, the state of the art for ANNS on million-scale datasets is quite clear. This competition aims at pushing the scale to out-of-memory billion-scale datasets and other hardware configurations that are realistic in many current applications. The competition uses six representative billion-scale datasets -- many newly released for this competition -- with their associated accuracy metrics. There are three tracks depending on hardware settings: (T1) limited memory (T2) limited main memory + SSD (T3) any hardware configuration including accelerators and custom silicon. We will use two recent indexing algorithms, DiskANN and FAISS, as baselines for tracks T1 and T2. The anticipated impact is an understanding of the ideas that apply at a billion-point scale, bridging communities that work on ANNS problems, and a platform for newer researchers to contribute and develop this relatively new research area. We will provide Azure cloud compute credit to participants with promising ideas without necessary infrastructure to develop their submissions.