Timezone: »
Existing weak supervision approaches use all the data covered by weak signals to train a classifier. We show both theoretically and empirically that this is not always optimal. Intuitively, there is a tradeoff between the amount of weakly-labeled data and the precision of the weak labels. We explore this tradeoff by combining pretrained data representations with the cut statistic to select (hopefully) high-quality subsets of the weakly-labeled training data. Subset selection applies to any label model and classifier and is very simple to plug in to existing weak supervision pipelines, requiring just a few lines of code. We show our subset selection method improves the performance of weak supervision for a wide range of label models, classifiers, and datasets. Using less weakly-labeled data improves the accuracy of weak supervision pipelines by up to 19% (absolute) on benchmark tasks.
Author Information
Hunter Lang (MIT)
Aravindan Vijayaraghavan (Northwestern University)
David Sontag (MIT)
More from the Same Authors
-
2022 : PEST: Combining Parameter-Efficient Fine-Tuning with Self-Training and Co-Training »
Hunter Lang · Monica Agrawal · Yoon Kim · David Sontag -
2023 Poster: Effective Human-AI Teams via Learned Natural Language Rules and Onboarding »
Hussein Mozannar · Jimin Lee · Dennis Wei · Prasanna Sattigeri · Subhro Das · David Sontag -
2022 Poster: The Burer-Monteiro SDP method can fail even above the Barvinok-Pataki bound »
Liam O'Carroll · Vaidehi Srinivas · Aravindan Vijayaraghavan -
2022 Poster: Falsification before Extrapolation in Causal Effect Estimation »
Zeshan M Hussain · Michael Oberst · Ming-Chieh Shih · David Sontag -
2022 Poster: Evaluating Robustness to Dataset Shift via Parametric Robustness Sets »
Nikolaj Thams · Michael Oberst · David Sontag -
2022 Poster: ETAB: A Benchmark Suite for Visual Representation Learning in Echocardiography »
Ahmed Alaa · Anthony Philippakis · David Sontag -
2021 Poster: Efficient Algorithms for Learning Depth-2 Neural Networks with General ReLU Activations »
Pranjal Awasthi · Alex Tang · Aravindan Vijayaraghavan -
2021 Poster: Finding Regions of Heterogeneity in Decision-Making via Expected Conditional Covariance »
Justin Lim · Christina Ji · Michael Oberst · Saul Blecker · Leora Horwitz · David Sontag -
2020 Poster: Adversarial robustness via robust low rank representations »
Pranjal Awasthi · Himanshu Jain · Ankit Singh Rawat · Aravindan Vijayaraghavan -
2019 Poster: Using Statistics to Automate Stochastic Optimization »
Hunter Lang · Lin Xiao · Pengchuan Zhang -
2019 Poster: On Robustness to Adversarial Examples and Polynomial Optimization »
Pranjal Awasthi · Abhratanu Dutta · Aravindan Vijayaraghavan -
2019 Poster: Understanding the Role of Momentum in Stochastic Gradient Methods »
Igor Gitman · Hunter Lang · Pengchuan Zhang · Lin Xiao -
2018 : TBC 13 »
David Sontag -
2018 Poster: Why Is My Classifier Discriminatory? »
Irene Chen · Fredrik Johansson · David Sontag -
2018 Spotlight: Why Is My Classifier Discriminatory? »
Irene Chen · Fredrik Johansson · David Sontag -
2017 : Invited Talk 4 »
David Sontag -
2017 Poster: Causal Effect Inference with Deep Latent-Variable Models »
Christos Louizos · Uri Shalit · Joris Mooij · David Sontag · Richard Zemel · Max Welling -
2017 Poster: Clustering Stable Instances of Euclidean k-means. »
Aravindan Vijayaraghavan · Abhratanu Dutta · Alex Wang -
2015 Workshop: Machine Learning For Healthcare (MLHC) »
Theofanis Karaletsos · Rajesh Ranganath · Suchi Saria · David Sontag -
2015 Poster: Barrier Frank-Wolfe for Marginal Inference »
Rahul G Krishnan · Simon Lacoste-Julien · David Sontag -
2013 Poster: Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests »
Yacine Jernite · Yoni Halpern · David Sontag -
2011 Poster: Complexity of Inference in Latent Dirichlet Allocation »
David Sontag · Daniel Roy -
2011 Spotlight: Complexity of Inference in Latent Dirichlet Allocation »
David Sontag · Daniel Roy -
2010 Spotlight: More data means less inference: A pseudo-max approach to structured learning »
David Sontag · Ofer Meshi · Tommi Jaakkola · Amir Globerson -
2010 Poster: More data means less inference: A pseudo-max approach to structured learning »
David Sontag · Ofer Meshi · Tommi Jaakkola · Amir Globerson -
2009 Workshop: Approximate Learning of Large Scale Graphical Models »
Russ Salakhutdinov · Amir Globerson · David Sontag -
2008 Workshop: Approximate inference - how far have we come? »
Amir Globerson · David Sontag · Tommi Jaakkola -
2008 Poster: Clusters and Coarse Partitions in LP Relaxations »
David Sontag · Amir Globerson · Tommi Jaakkola -
2008 Spotlight: Clusters and Coarse Partitions in LP Relaxations »
David Sontag · Amir Globerson · Tommi Jaakkola -
2007 Oral: New Outer Bounds on the Marginal Polytope »
David Sontag · Tommi Jaakkola -
2007 Poster: New Outer Bounds on the Marginal Polytope »
David Sontag · Tommi Jaakkola