Timezone: »
Weak supervision (WS) is a rich set of techniques that produce pseudolabels by aggregating easily obtained but potentially noisy label estimates from various sources. WS is theoretically well-understood for binary classification, where simple approaches enable consistent estimation of pseudolabel noise rates. Using this result, it has been shown that downstream models trained on the pseudolabels have generalization guarantees nearly identical to those trained on clean labels. While this is exciting, users often wish to use WS for \emph{structured prediction}, where the output space consists of more than a binary or multi-class label set: e.g. rankings, graphs, manifolds, and more. Do the favorable theoretical properties of WS for binary classification lift to this setting? We answer this question in the affirmative for a wide range of scenarios. For labels taking values in a finite metric space, we introduce techniques new to weak supervision based on pseudo-Euclidean embeddings and tensor decompositions, providing a nearly-consistent noise rate estimator. For labels in constant-curvature Riemannian manifolds, we introduce new invariants that also yield consistent noise rate estimation. In both cases, when using the resulting pseudolabels in concert with a flexible downstream model, we obtain generalization guarantees nearly identical to those for models trained on clean data. Several of our results, which can be viewed as robustness guarantees in structured prediction with noisy labels, may be of independent interest.
Author Information
Harit Vishwakarma (University of Wisconsin Madison)
Frederic Sala (University of Wisconsin, Madison)
More from the Same Authors
-
2022 : Anomaly Detection with Multiple Reference Datasets in High Energy Physics »
Mayee Chen · Benjamin Nachman · Frederic Sala -
2022 : AutoML for Climate Change: A Call to Action »
Renbo Tu · Nicholas Roberts · Vishak Prasad C · Sibasis Nayak · Paarth Jain · Frederic Sala · Ganesh Ramakrishnan · Ameet Talwalkar · Willie Neiswanger · Colin White -
2022 : Domain Generalization with Nuclear Norm Regularization »
Zhenmei Shi · Yifei Ming · Ying Fan · Frederic Sala · Yingyu Liang -
2023 Poster: Mitigating Source Bias for Fairer Weak Supervision »
Changho Shin · Sonia Cromp · Dyah Adila · Frederic Sala -
2023 Poster: Geometry-Aware Adaptation for Pretrained Models »
Nicholas Roberts · Xintong Li · Dyah Adila · Sonia Cromp · Tzu-Heng Huang · Jitian Zhao · Frederic Sala -
2023 Poster: Promises and Pitfalls of Threshold-based Auto-labeling »
Harit Vishwakarma · Heguang Lin · Frederic Sala · Ramya Korlakai Vinayak -
2023 Poster: Skill-it! A data-driven skills framework for understanding and training language models »
Mayee Chen · Nicholas Roberts · Kush Bhatia · Jue WANG · Ce Zhang · Frederic Sala · Christopher Ré -
2023 Poster: Train 'n Trade: Foundations of Parameter Markets »
Tzu-Heng Huang · Harit Vishwakarma · Frederic Sala -
2023 Poster: Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification »
Neel Guha · Mayee Chen · Kush Bhatia · Azalia Mirhoseini · Frederic Sala · Christopher Ré -
2022 Competition: AutoML Decathlon: Diverse Tasks, Modern Methods, and Efficiency at Scale »
Samuel Guo · Cong Xu · Nicholas Roberts · Misha Khodak · Junhong Shen · Evan Sparks · Ameet Talwalkar · Yuriy Nevmyvaka · Frederic Sala · Anderson Schneider -
2022 : Q & A »
Frederic Sala · Ramya Korlakai Vinayak -
2022 Tutorial: Theory and Practice of Efficient and Accurate Dataset Construction »
Frederic Sala · Ramya Korlakai Vinayak -
2022 : Tutorial part 1 »
Frederic Sala · Ramya Korlakai Vinayak -
2022 Poster: AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels »
Nicholas Roberts · Xintong Li · Tzu-Heng Huang · Dyah Adila · Spencer Schoenberg · Cheng-Yu Liu · Lauren Pick · Haotian Ma · Aws Albarghouthi · Frederic Sala -
2022 Poster: NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse Tasks »
Renbo Tu · Nicholas Roberts · Misha Khodak · Junhong Shen · Frederic Sala · Ameet Talwalkar -
2020 Poster: Attack of the Tails: Yes, You Really Can Backdoor Federated Learning »
Hongyi Wang · Kartik Sreenivasan · Shashank Rajput · Harit Vishwakarma · Saurabh Agarwal · Jy-yong Sohn · Kangwook Lee · Dimitris Papailiopoulos -
2020 Poster: Optimal Lottery Tickets via Subset Sum: Logarithmic Over-Parameterization is Sufficient »
Ankit Pensia · Shashank Rajput · Alliot Nagle · Harit Vishwakarma · Dimitris Papailiopoulos -
2020 Spotlight: Optimal Lottery Tickets via Subset Sum: Logarithmic Over-Parameterization is Sufficient »
Ankit Pensia · Shashank Rajput · Alliot Nagle · Harit Vishwakarma · Dimitris Papailiopoulos -
2019 Poster: Quantum Embedding of Knowledge for Reasoning »
Dinesh Garg · Shajith Ikbal Mohamed · Santosh Kumar Srivastava · Harit Vishwakarma · Hima Karanam · L Venkata Subramaniam