Timezone: »

 
Poster
Clustering with Noisy Queries
Arya Mazumdar · Barna Saha

Wed Dec 06 06:30 PM -- 10:30 PM (PST) @ Pacific Ballroom #164
In this paper, we provide a rigorous theoretical study of clustering with noisy queries. Given a set of $n$ elements, our goal is to recover the true clustering by asking minimum number of pairwise queries to an oracle. Oracle can answer queries of the form ``do elements $u$ and $v$ belong to the same cluster?''-the queries can be asked interactively (adaptive queries), or non-adaptively up-front, but its answer can be erroneous with probability $p$. In this paper, we provide the first information theoretic lower bound on the number of queries for clustering with noisy oracle in both situations. We design novel algorithms that closely match this query complexity lower bound, even when the number of clusters is unknown. Moreover, we design computationally efficient algorithms both for the adaptive and non-adaptive settings. The problem captures/generalizes multiple application scenarios. It is directly motivated by the growing body of work that use crowdsourcing for {\em entity resolution}, a fundamental and challenging data mining task aimed to identify all records in a database referring to the same entity. Here crowd represents the noisy oracle, and the number of queries directly relates to the cost of crowdsourcing. Another application comes from the problem of sign edge prediction in social network, where social interactions can be both positive and negative, and one must identify the sign of all pair-wise interactions by querying a few pairs. Furthermore, clustering with noisy oracle is intimately connected to correlation clustering, leading to improvement therein. Finally, it introduces a new direction of study in the popular stochastic block model where one has an incomplete stochastic block model matrix to recover the clusters.

Author Information

Arya Mazumdar (University of Massachusetts Amherst)
Barna Saha (University of Massachusetts Amherst)

More from the Same Authors

  • 2021 Poster: Support Recovery of Sparse Signals from a Mixture of Linear Measurements »
    Soumyabrata Pal · Arya Mazumdar · Venkata Gandikota
  • 2021 Poster: Fuzzy Clustering with Similarity Queries »
    Wasim Huleihel · Arya Mazumdar · Soumyabrata Pal
  • 2019 : Poster Session »
    Gergely Flamich · Shashanka Ubaru · Charles Zheng · Josip Djolonga · Kristoffer Wickstrøm · Diego Granziol · Konstantinos Pitas · Jun Li · Robert Williamson · Sangwoong Yoon · Kwot Sin Lee · Julian Zilly · Linda Petrini · Ian Fischer · Zhe Dong · Alexander Alemi · Bao-Ngoc Nguyen · Rob Brekelmans · Tailin Wu · Aditya Mahajan · Alexander Li · Kirankumar Shiragur · Yair Carmon · Linara Adilova · SHIYU LIU · Bang An · Sanjeeb Dash · Oktay Gunluk · Arya Mazumdar · Mehul Motani · Julia Rosenzweig · Michael Kamp · Marton Havasi · Leighton P Barnes · Zhengqing Zhou · Yi Hao · Dylan Foster · Yuval Benjamini · Nati Srebro · Michael Tschannen · Paul Rubenstein · Sylvain Gelly · John Duchi · Aaron Sidford · Robin Ru · Stefan Zohren · Murtaza Dalal · Michael A Osborne · Stephen J Roberts · Moses Charikar · Jayakumar Subramanian · Xiaodi Fan · Max Schwarzer · Nicholas Roberts · Simon Lacoste-Julien · Vinay Prabhu · Aram Galstyan · Greg Ver Steeg · Lalitha Sankar · Yung-Kyun Noh · Gautam Dasarathy · Frank Park · Ngai-Man (Man) Cheung · Ngoc-Trung Tran · Linxiao Yang · Ben Poole · Andrea Censi · Tristan Sylvain · R Devon Hjelm · Bangjie Liu · Jose Gallego-Posada · Tyler Sypherd · Kai Yang · Jan Nikolas Morshuis
  • 2019 : Poster Session »
    Jonathan Scarlett · Piotr Indyk · Ali Vakilian · Adrian Weller · Partha P Mitra · Benjamin Aubin · Bruno Loureiro · Florent Krzakala · Lenka Zdeborová · Kristina Monakhova · Joshua Yurtsever · Laura Waller · Hendrik Sommerhoff · Michael Moeller · Rushil Anirudh · Shuang Qiu · Xiaohan Wei · Zhuoran Yang · Jayaraman Thiagarajan · Salman Asif · Michael Gillhofer · Johannes Brandstetter · Sepp Hochreiter · Felix Petersen · Dhruv Patel · Assad Oberai · Akshay Kamath · Sushrut Karmalkar · Eric Price · Ali Ahmed · Zahra Kadkhodaie · Sreyas Mohan · Eero Simoncelli · Carlos Fernandez-Granda · Oscar Leong · Wesam Sakla · Rebecca Willett · Stephan Hoyer · Jascha Sohl-Dickstein · Sam Greydanus · Gauri Jagatap · Chinmay Hegde · Michael Kellman · Jonathan Tamir · Nouamane Laanait · Ousmane Dia · Mirco Ravanelli · Jonathan Binas · Negar Rostamzadeh · Shirin Jalali · Tiantian Fang · Alex Schwing · Sébastien Lachapelle · Philippe Brouillard · Tristan Deleu · Simon Lacoste-Julien · Stella Yu · Arya Mazumdar · Ankit Singh Rawat · Yue Zhao · Jianshu Chen · Xiaoyang Li · Hubert Ramsauer · Gabrio Rizzuti · Nikolaos Mitsakos · Dingzhou Cao · Thomas Strohmer · Yang Li · Pei Peng · Gregory Ongie
  • 2019 Poster: Superset Technique for Approximate Recovery in One-Bit Compressed Sensing »
    Larkin Flodin · Venkata Gandikota · Arya Mazumdar
  • 2019 Poster: Sample Complexity of Learning Mixture of Sparse Linear Regressions »
    Akshay Krishnamurthy · Arya Mazumdar · Andrew McGregor · Soumyabrata Pal
  • 2019 Poster: Same-Cluster Querying for Overlapping Clusters »
    Wasim Huleihel · Arya Mazumdar · Muriel Medard · Soumyabrata Pal
  • 2017 Poster: Semisupervised Clustering, AND-Queries and Locally Encodable Source Coding »
    Arya Mazumdar · Soumyabrata Pal
  • 2017 Spotlight: Semisupervised Clustering, AND-Queries and Locally Encodable Source Coding »
    Arya Mazumdar · Soumyabrata Pal
  • 2017 Poster: Query Complexity of Clustering with Side Information »
    Arya Mazumdar · Barna Saha
  • 2015 Poster: Associative Memory via a Sparse Recovery Model »
    Arya Mazumdar · Ankit Singh Rawat