We present an agnostic active learning algorithm for any hypothesis class of bounded VC dimension under arbitrary data distributions. Most previous work on active learning either makes strong distributional assumptions, or else is computationally prohibitive. Our algorithm extends the simple scheme of Cohn, Atlas, and Ladner to the agnostic setting, using reductions to supervised learning that harness generalization bounds in a simple but subtle manner. We provide a fallback guarantee that bounds the algorithm's label complexity by the agnostic PAC sample complexity. Our analysis yields asymptotic label complexity improvements for certain hypothesis classes and distributions. We also demonstrate improvements experimentally.
Author Information
Sanjoy Dasgupta (UC San Diego)
Daniel Hsu (Columbia University)
Claire Monteleoni (University of Colorado Boulder)
Claire Monteleoni is an associate professor of Computer Science at University of Colorado Boulder. Previously, she was an associate professor at George Washington University, and research faculty at the Center for Computational Learning Systems, at Columbia University. She did a postdoc in Computer Science and Engineering at the University of California, San Diego, and completed her PhD and Masters in Computer Science, at MIT. She holds a Bachelors in Earth and Planetary Sciences from Harvard. Her research focuses on machine learning algorithms and theory for problems including learning from data streams, learning from raw (unlabeled) data, learning from private data, and climate informatics: accelerating discovery in climate science with machine learning. Her work on climate informatics received the Best Application Paper Award at NASA CIDU 2010. In 2011, she cofounded the International Workshop on Climate Informatics, which is now in its fourth year, attracting climate scientists and data scientists from over 14 countries and 26 states.
More from the Same Authors

2019 Poster: On the number of variables to use in principal component regression »
Ji Xu · Daniel Hsu 
2018 Poster: Interactive Structure Learning with Structural QuerybyCommittee »
Christopher Tosh · Sanjoy Dasgupta 
2018 Spotlight: Interactive Structure Learning with Structural QuerybyCommittee »
Christopher Tosh · Sanjoy Dasgupta 
2018 Poster: Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate »
Mikhail Belkin · Daniel Hsu · Partha Mitra 
2018 Poster: Benefits of overparameterization with EM »
Ji Xu · Daniel Hsu · Arian Maleki 
2018 Poster: Leveraged volume sampling for linear regression »
Michal Derezinski · Manfred K. Warmuth · Daniel Hsu 
2018 Spotlight: Leveraged volume sampling for linear regression »
Michal Derezinski · Manfred K. Warmuth · Daniel Hsu 
2017 Poster: Linear regression without correspondence »
Daniel Hsu · Kevin Shi · Xiaorui Sun 
2016 Poster: An algorithm for L1 nearest neighbor search via monotonic embedding »
Xinan Wang · Sanjoy Dasgupta 
2016 Poster: Global Analysis of Expectation Maximization for Mixtures of Two Gaussians »
Ji Xu · Daniel Hsu · Arian Maleki 
2016 Oral: Global Analysis of Expectation Maximization for Mixtures of Two Gaussians »
Ji Xu · Daniel Hsu · Arian Maleki 
2016 Poster: Search Improves Label for Active Learning »
Alina Beygelzimer · Daniel Hsu · John Langford · Chicheng Zhang 
2015 Poster: Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path »
Daniel Hsu · Aryeh Kontorovich · Csaba Szepesvari 
2015 Poster: Efficient and Parsimonious Agnostic Active Learning »
TzuKuo Huang · Alekh Agarwal · Daniel Hsu · John Langford · Robert Schapire 
2015 Spotlight: Efficient and Parsimonious Agnostic Active Learning »
TzuKuo Huang · Alekh Agarwal · Daniel Hsu · John Langford · Robert Schapire 
2014 Poster: Rates of Convergence for Nearest Neighbor Classification »
Kamalika Chaudhuri · Sanjoy Dasgupta 
2014 Spotlight: Rates of Convergence for Nearest Neighbor Classification »
Kamalika Chaudhuri · Sanjoy Dasgupta 
2014 Poster: Scalable Nonlinear Learning with Adaptive Polynomial Expansions »
Alekh Agarwal · Alina Beygelzimer · Daniel Hsu · John Langford · Matus J Telgarsky 
2014 Poster: The Large Margin Mechanism for Differentially Private Maximization »
Kamalika Chaudhuri · Daniel Hsu · Shuang Song 
2014 Tutorial: Climate Change: Challenges for Machine Learning »
Arindam Banerjee · Claire Monteleoni 
2013 Workshop: Workshop on Spectral Learning »
Byron Boots · Daniel Hsu · Borja Balle 
2013 Poster: The Fast Convergence of Incremental PCA »
Akshay Balsubramani · Sanjoy Dasgupta · Yoav Freund 
2013 Poster: When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity »
Anima Anandkumar · Daniel Hsu · Majid Janzamin · Sham M Kakade 
2013 Poster: Contrastive Learning Using Spectral Methods »
James Y Zou · Daniel Hsu · David Parkes · Ryan Adams 
2012 Poster: Learning Mixtures of Tree Graphical Models »
Anima Anandkumar · Daniel Hsu · Furong Huang · Sham M Kakade 
2012 Poster: A Spectral Algorithm for Latent Dirichlet Allocation »
Anima Anandkumar · Dean P Foster · Daniel Hsu · Sham M Kakade · YiKai Liu 
2012 Poster: Identifiability and Unmixing of Latent Parse Trees »
Percy Liang · Sham M Kakade · Daniel Hsu 
2012 Spotlight: A Spectral Algorithm for Latent Dirichlet Allocation »
Anima Anandkumar · Dean P Foster · Daniel Hsu · Sham M Kakade · YiKai Liu 
2011 Poster: Stochastic convex optimization with bandit feedback »
Alekh Agarwal · Dean P Foster · Daniel Hsu · Sham M Kakade · Sasha Rakhlin 
2011 Poster: Spectral Methods for Learning Multivariate Latent Tree Structure »
Anima Anandkumar · Kamalika Chaudhuri · Daniel Hsu · Sham M Kakade · Le Song · Tong Zhang 
2010 Poster: Rates of convergence for the cluster tree »
Kamalika Chaudhuri · Sanjoy Dasgupta 
2010 Poster: Agnostic Active Learning Without Constraints »
Alina Beygelzimer · Daniel Hsu · John Langford · Tong Zhang 
2009 Poster: A Parameterfree Hedging Algorithm »
Kamalika Chaudhuri · Yoav Freund · Daniel Hsu 
2009 Poster: MultiLabel Prediction via Compressed Sensing »
Daniel Hsu · Sham M Kakade · John Langford · Tong Zhang 
2009 Oral: MultiLabel Prediction via Compressed Sensing »
Daniel Hsu · Sham M Kakade · John Langford · Tong Zhang 
2009 Poster: Streaming kmeans approximation »
Nir Ailon · Ragesh Jaiswal · Claire Monteleoni 
2008 Poster: Privacypreserving logistic regression »
Kamalika Chaudhuri · Claire Monteleoni 
2007 Session: Session 3: Theory and Sequential Decision Making »
Sanjoy Dasgupta 
2007 Poster: A general agnostic active learning algorithm »
Sanjoy Dasgupta · Daniel Hsu · Claire Monteleoni 
2007 Poster: Learning the structure of manifolds using random projections »
Yoav S Freund · Sanjoy Dasgupta · Mayank Kabra · Nakul Verma 
2007 Poster: A learning framework for nearest neighbor search »
Lawrence Cayton · Sanjoy Dasgupta