Timezone: »
All machine learning systems are an integration of data that store human or physical knowledge, and algorithms that discover knowledge patterns and make predictions to new instances. Even though most research attention has been focused on developing more efficient learning algorithms, it is the quality and amount of training data that predominately govern the performance of real-world systems. This is only amplified by the recent popularity of large scale and complicated learning systems such as deep networks, which require millions to billions of training data to perform well. Unfortunately, the traditional methods of collecting data from specialized workers are usually expensive and slow. In recent years, however, the situation has dramatically changed with the emergence of crowdsourcing, where huge amounts of labeled data are collected from large groups of (usually online) workers for low or no cost. Many machine learning tasks, such as computer vision and natural language processing are increasingly benefitting from data crowdsourced platforms such as Amazon Mechanical Turk and CrowdFlower. On the other hand, tools in machine learning, game theory and mechanism design can help to address many challenging problems in crowdsourcing systems, such as making them more reliable, efficient and less expensive.
In this workshop, we call attention back to sources of data, discussing cheap and fast data collection methods based on crowdsourcing, and how it could impact subsequent machine learning stages.
Furthermore, we will emphasize how the data sourcing paradigm interacts with the most recent emerging trends of machine learning in NIPS community.
Examples of topics of potential interest in the workshop include (but are not limited to):
Application of crowdsourcing to machine learning.
Reliable crowdsourcing, e.g., label aggregation, quality control.
Optimal budget allocation or active learning in crowdsourcing.
Workflow design and answer aggregation for complex tasks (e.g., machine translation, proofreading).
Pricing and incentives in crowdsourcing markets.
Prediction markets / information markets and its connection to learning.
Theoretical analysis for crowdsourcing algorithms, e.g., error rates and sample complexities for label aggregation and budget allocation algorithms.
Author Information
Jennifer Wortman Vaughan (Microsoft Research)

Jenn Wortman Vaughan is a Senior Principal Researcher at Microsoft Research, New York City. Her research background is in machine learning and algorithmic economics. She is especially interested in the interaction between people and AI, and has often studied this interaction in the context of prediction markets and other crowdsourcing systems. In recent years, she has turned her attention to human-centered approaches to transparency, interpretability, and fairness in machine learning as part of MSR's FATE group and co-chair of Microsoft’s Aether Working Group on Transparency. Jenn came to MSR in 2012 from UCLA, where she was an assistant professor in the computer science department. She completed her Ph.D. at the University of Pennsylvania in 2009, and subsequently spent a year as a Computing Innovation Fellow at Harvard. She is the recipient of Penn's 2009 Rubinoff dissertation award for innovative applications of computer technology, a National Science Foundation CAREER award, a Presidential Early Career Award for Scientists and Engineers (PECASE), and a handful of best paper awards. In her "spare" time, Jenn is involved in a variety of efforts to provide support for women in computer science; most notably, she co-founded the Annual Workshop for Women in Machine Learning, which has been held each year since 2006.
Greg Stoddard (Northwestern University)
Chien-Ju Ho (UCLA)
Adish Singla (MPI-SWS)
Michael Bernstein (Stanford University)
Devavrat Shah (Massachusetts Institute of Technology)
Devavrat Shah is a professor of Electrical Engineering & Computer Science and Director of Statistics and Data Science at MIT. He received PhD in Computer Science from Stanford. He received Erlang Prize from Applied Probability Society of INFORMS in 2010 and NeuIPS best paper award in 2008.
Arpita Ghosh (Cornell University)
Evgeniy Gabrilovich (Google)
Denny Zhou (Microsoft Research Redmond)
Nikhil Devanur (Microsoft Research)
Xi Chen (NYU)
Xi Chen is an associate professor with tenure at Stern School of Business at New York University, who is also an affiliated professor to Computer Science and Center for Data Science. Before that, he was a Postdoc in the group of Prof. Michael Jordan at UC Berkeley. He obtained his Ph.D. from the Machine Learning Department at Carnegie Mellon University (CMU). He studies high-dimensional statistical learning, online learning, large-scale stochastic optimization, and applications to operations. He has published more than 20 journal articles in statistics, machine learning, and operations, and 30 top machine learning peer-reviewed conference proceedings. He received NSF Career Award, ICSA Outstanding Young Researcher Award, Faculty Research Awards from Google, Adobe, Alibaba, and Bloomberg, and was featured in Forbes list of “30 Under30 in Science”.
Alexander Ihler (UC Irvine)
Qiang Liu (UC Irvine)
Genevieve Patterson (Climate Change AI)
Ashwinkumar Badanidiyuru Varadaraja (Google Research)
Hossein Azari Soufiani (Harvard University)
Jacob Whitehill (University of California, San Diego)
More from the Same Authors
-
2021 Spotlight: Regulating algorithmic filtering on social media »
Sarah Cen · Devavrat Shah -
2021 : GAM Changer: Editing Generalized Additive Models with Interactive Visualization »
Zijie Jay Wang · Harsha Nori · Duen Horng Chau · Jennifer Wortman Vaughan · Rich Caruana -
2021 : Temporal-Difference Value Estimation via Uncertainty-Guided Soft Updates »
Litian Liang · Yaosheng Xu · Stephen McAleer · Dailin Hu · Alexander Ihler · Pieter Abbeel · Roy Fox -
2021 : Regret, stability, and fairness in matching markets with bandit learners »
Sarah Cen · Devavrat Shah -
2021 : Regret, stability, and fairness in matching markets with bandit learners »
Sarah Cen · Devavrat Shah -
2022 : Generation Probabilities are Not Enough: Improving Error Highlighting for AI Code Suggestions »
Helena Vasconcelos · Gagan Bansal · Adam Fourney · Q.Vera Liao · Jennifer Wortman Vaughan -
2022 : Beyond Decision Recommendations: Stop Putting Machine Learning First and Design Human-Centered AI for Decision Support »
Zana Bucinca · Alexandra Chouldechova · Jennifer Wortman Vaughan · Krzysztof Z Gajos -
2022 : A Causal Inference Framework for Network Interference with Panel Data »
Sarah Cen · Anish Agarwal · Christina Yu · Devavrat Shah -
2022 : On counterfactual inference with unobserved confounding »
Abhin Shah · Raaz Dwivedi · Devavrat Shah · Gregory Wornell -
2023 Poster: Follow-ups Also Matter: Improving Contextual Bandits via Post-serving Contexts »
Chaoqi Wang · Ziyu Ye · Zhe Feng · Ashwinkumar Badanidiyuru Varadaraja · Haifeng Xu -
2023 Poster: Optimal Unbiased Randomizers for Regression with Label Differential Privacy »
Ashwinkumar Badanidiyuru Varadaraja · Badih Ghazi · Pritish Kamath · Ravi Kumar · Ethan Leeman · Pasin Manurangsi · Avinash V Varadarajan · Chiyuan Zhang -
2023 Poster: Auditing for Human Expertise »
Rohan Alur · Loren Laine · Darrick Li · Manish Raghavan · Devavrat Shah · Dennis Shung -
2023 Poster: SAMoSSA: Multivariate Singular Spectrum Analysis with Stochastic Autoregressive Noise »
Abdullah Alomar · Munther Dahleh · Sean Mann · Devavrat Shah -
2022 : Panel »
Meena Jagadeesan · Avrim Blum · Jon Kleinberg · Celestine Mendler-Dünner · Jennifer Wortman Vaughan · Chara Podimata -
2022 Poster: ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward »
Zixian Ma · Rose Wang · Fei-Fei Li · Michael Bernstein · Ranjay Krishna -
2022 Poster: Incrementality Bidding via Reinforcement Learning under Mixed and Delayed Rewards »
Ashwinkumar Badanidiyuru Varadaraja · Zhe Feng · Tianxi Li · Haifeng Xu -
2021 : Human Computer Interaction and Crowdsourcing for Data Centric AI »
Michael Bernstein -
2021 : Fairness:: Assessing Fairness in Practice: AI Teams’ Processes, Challenges, and Needs for Support »
Michael Madaio · Hariharan Subramonyam · Jennifer Wortman Vaughan -
2021 Poster: A Computationally Efficient Method for Learning Exponential Family Distributions »
Abhin Shah · Devavrat Shah · Gregory Wornell -
2021 Poster: Regulating algorithmic filtering on social media »
Sarah Cen · Devavrat Shah -
2021 Poster: Change Point Detection via Multivariate Singular Spectrum Analysis »
Arwa Alanqary · Abdullah Alomar · Devavrat Shah -
2021 Poster: PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators »
Anish Agarwal · Abdullah Alomar · Varkey Alumootil · Devavrat Shah · Dennis Shen · Zhi Xu · Cindy Yang -
2020 : Q & A and Panel Session with Tom Mitchell, Jenn Wortman Vaughan, Sanjoy Dasgupta, and Finale Doshi-Velez »
Tom Mitchell · Jennifer Wortman Vaughan · Sanjoy Dasgupta · Finale Doshi-Velez · Zachary Lipton -
2020 Poster: Estimation of Skill Distribution from a Tournament »
Ali Jadbabaie · Anuran Makur · Devavrat Shah -
2020 Spotlight: Estimation of Skill Distribution from a Tournament »
Ali Jadbabaie · Anuran Makur · Devavrat Shah -
2020 Poster: Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation »
Devavrat Shah · Dogyoon Song · Zhi Xu · Yuzhe Yang -
2020 Demonstration: tspDB: Time Series Predict DB »
Anish Agarwal · Abdullah Alomar · Devavrat Shah -
2019 : Poster Session »
Ethan Harris · Tom White · Oh Hyeon Choung · Takashi Shinozaki · Dipan Pal · Katherine L. Hermann · Judy Borowski · Camilo Fosco · Chaz Firestone · Vijay Veerabadran · Benjamin Lahner · Chaitanya Ryali · Fenil Doshi · Pulkit Singh · Sharon Zhou · Michel Besserve · Michael Chang · Anelise Newman · Mahesan Niranjan · Jonathon Hare · Daniela Mihai · Marios Savvides · Simon Kornblith · Christina M Funke · Aude Oliva · Virginia de Sa · Dmitry Krotov · Colin Conwell · George Alvarez · Alex Kolchinski · Shengjia Zhao · Mitchell Gordon · Michael Bernstein · Stefano Ermon · Arash Mehrjou · Bernhard Schölkopf · John Co-Reyes · Michael Janner · Jiajun Wu · Josh Tenenbaum · Sergey Levine · Yalda Mohsenzadeh · Zhenglong Zhou -
2019 Poster: On Robustness of Principal Component Regression »
Anish Agarwal · Devavrat Shah · Dennis Shen · Dogyoon Song -
2019 Oral: On Robustness of Principal Component Regression »
Anish Agarwal · Devavrat Shah · Dennis Shen · Dogyoon Song -
2019 Poster: HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models »
Sharon Zhou · Mitchell Gordon · Ranjay Krishna · Austin Narcomey · Li Fei-Fei · Michael Bernstein -
2019 Oral: HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models »
Sharon Zhou · Mitchell Gordon · Ranjay Krishna · Austin Narcomey · Li Fei-Fei · Michael Bernstein -
2019 Tutorial: Synthetic Control »
Alberto Abadie · Vishal Misra · Devavrat Shah -
2018 Poster: Lifted Weighted Mini-Bucket »
Nicholas Gallo · Alexander Ihler -
2018 Poster: Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models »
Yining Wang · Xi Chen · Yuan Zhou -
2018 Poster: Q-learning with Nearest Neighbors »
Devavrat Shah · Qiaomin Xie -
2017 : The Unfair Externalities of Exploration »
Aleksandrs Slivkins · Jennifer Wortman Vaughan -
2017 Workshop: Teaching Machines, Robots, and Humans »
Maya Cakmak · Anna Rafferty · Adish Singla · Jerry Zhu · Sandra Zilles -
2017 Workshop: NIPS Highlights (MLTrain), Learn How to code a paper with state of the art frameworks »
Alex Dimakis · Nikolaos Vasiloglou · Guy Van den Broeck · Alexander Ihler · Assaf Araki -
2017 : Poster spotlights »
Hiroshi Kuwajima · Masayuki Tanaka · Qingkai Liang · Matthieu Komorowski · Fanyu Que · Thalita F Drumond · Aniruddh Raghu · Leo Anthony Celi · Christina Göpfert · Andrew Ross · Sarah Tan · Rich Caruana · Yin Lou · Devinder Kumar · Graham Taylor · Forough Poursabzi-Sangdeh · Jennifer Wortman Vaughan · Hanna Wallach -
2017 Workshop: Nearest Neighbors for Modern Applications with Massive Data: An Age-old Solution with New Challenges »
George H Chen · Devavrat Shah · Christina Lee -
2017 Workshop: Learning in the Presence of Strategic Behavior »
Nika Haghtalab · Yishay Mansour · Tim Roughgarden · Vasilis Syrgkanis · Jennifer Wortman Vaughan -
2017 Poster: A Decomposition of Forecast Error in Prediction Markets »
Miro Dudik · Sebastien Lahaie · Ryan Rogers · Jennifer Wortman Vaughan -
2017 Poster: Thy Friend is My Friend: Iterative Collaborative Filtering for Sparse Matrix Estimation »
Christian Borgs · Jennifer Chayes · Christina Lee · Devavrat Shah -
2017 Poster: Dynamic Importance Sampling for Anytime Bounds of the Partition Function »
Qi Lou · Rina Dechter · Alexander Ihler -
2016 : Jennifer Wortman Vaughan: "The Communication Network Within the Crowd" »
Jennifer Wortman Vaughan -
2016 Workshop: Crowdsourcing and Machine Learning »
Adish Singla · Rafael Frongillo · Matteo Venanzi -
2016 Poster: On the Recursive Teaching Dimension of VC Classes »
Peter Chen · Xi Chen · Yu Cheng · Bo Tang -
2016 Poster: Linear Contextual Bandits with Knapsacks »
Shipra Agrawal · Nikhil Devanur -
2016 Poster: Blind Regression: Nonparametric Regression for Latent Variable Models via Collaborative Filtering »
Dogyoon Song · Christina Lee · Yihua Li · Devavrat Shah -
2016 Poster: InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets »
Xi Chen · Peter Chen · Yan Duan · Rein Houthooft · John Schulman · Ilya Sutskever · Pieter Abbeel -
2016 Poster: VIME: Variational Information Maximizing Exploration »
Rein Houthooft · Xi Chen · Peter Chen · Yan Duan · John Schulman · Filip De Turck · Pieter Abbeel -
2016 Poster: Learning Infinite RBMs with Frank-Wolfe »
Wei Ping · Qiang Liu · Alexander Ihler -
2016 Poster: Improving Variational Autoencoders with Inverse Autoregressive Flow »
Diederik Kingma · Tim Salimans · Rafal Jozefowicz · Peter Chen · Xi Chen · Ilya Sutskever · Max Welling -
2016 Poster: Improved Techniques for Training GANs »
Tim Salimans · Ian Goodfellow · Wojciech Zaremba · Vicki Cheung · Alec Radford · Peter Chen · Xi Chen -
2016 Tutorial: Crowdsourcing: Beyond Label Generation »
Jennifer Wortman Vaughan -
2015 Poster: Double or Nothing: Multiplicative Incentive Mechanisms for Crowdsourcing »
Nihar Bhadresh Shah · Denny Zhou -
2015 Poster: Probabilistic Variational Bounds for Graphical Models »
Qiang Liu · John Fisher III · Alexander Ihler -
2015 Poster: Decomposition Bounds for Marginal MAP »
Wei Ping · Qiang Liu · Alexander Ihler -
2014 Workshop: NIPS’14 Workshop on Crowdsourcing and Machine Learning »
David Parkes · Denny Zhou · Chien-Ju Ho · Nihar Bhadresh Shah · Adish Singla · Jared Heyman · Edwin Simpson · Andreas Krause · Rafael Frongillo · Jennifer Wortman Vaughan · Panagiotis Papadimitriou · Damien Peters -
2014 Workshop: Analysis of Rank Data: Confluence of Social Choice, Operations Research, and Machine Learning »
Shivani Agarwal · Hossein Azari Soufiani · Guy Bresler · Sewoong Oh · David Parkes · Arun Rajkumar · Devavrat Shah -
2014 Workshop: NIPS Workshop on Transactional Machine Learning and E-Commerce »
David Parkes · David H Wolpert · Jennifer Wortman Vaughan · Jacob D Abernethy · Amos Storkey · Mark Reid · Ping Jin · Nihar Bhadresh Shah · Mehryar Mohri · Luis E Ortiz · Robin Hanson · Aaron Roth · Satyen Kale · Sebastien Lahaie -
2014 Poster: A Statistical Decision-Theoretic Framework for Social Choice »
Hossein Azari Soufiani · David Parkes · Lirong Xia -
2014 Poster: Hardness of parameter estimation in graphical models »
Guy Bresler · David Gamarnik · Devavrat Shah -
2014 Oral: A Statistical Decision-Theoretic Framework for Social Choice »
Hossein Azari Soufiani · David Parkes · Lirong Xia -
2014 Session: Oral Session 9 »
Jennifer Wortman Vaughan -
2014 Poster: Spectral Methods meet EM: A Provably Optimal Algorithm for Crowdsourcing »
Yuchen Zhang · Xi Chen · Denny Zhou · Michael Jordan -
2014 Spotlight: Spectral Methods meet EM: A Provably Optimal Algorithm for Crowdsourcing »
Yuchen Zhang · Xi Chen · Denny Zhou · Michael Jordan -
2014 Poster: A Latent Source Model for Online Collaborative Filtering »
Guy Bresler · George H Chen · Devavrat Shah -
2014 Spotlight: A Latent Source Model for Online Collaborative Filtering »
Guy Bresler · George H Chen · Devavrat Shah -
2014 Poster: Distributed Estimation, Information Loss and Exponential Families »
Qiang Liu · Alexander Ihler -
2014 Poster: Learning Mixed Multinomial Logit Model from Ordinal Data »
Sewoong Oh · Devavrat Shah -
2014 Poster: Structure learning of antiferromagnetic Ising models »
Guy Bresler · David Gamarnik · Devavrat Shah -
2013 Poster: A Latent Source Model for Nonparametric Time Series Classification »
George H Chen · Stanislav Nikolov · Devavrat Shah -
2013 Poster: Scoring Workers in Crowdsourcing: How Many Control Questions are Enough? »
Qiang Liu · Alexander Ihler · Mark Steyvers -
2013 Spotlight: Scoring Workers in Crowdsourcing: How Many Control Questions are Enough? »
Qiang Liu · Alexander Ihler · Mark Steyvers -
2013 Poster: Variance Reduction for Stochastic Gradient Optimization »
Chong Wang · Xi Chen · Alexander Smola · Eric Xing -
2013 Poster: Variational Planning for Graph-based MDPs »
Qiang Cheng · Qiang Liu · Feng Chen · Alexander Ihler -
2013 Poster: Generalized Random Utility Models with Multiple Types »
Hossein Azari Soufiani · Hansheng Diao · Zhenyu Lai · David Parkes -
2013 Poster: Computing the Stationary Distribution Locally »
Christina Lee · Asuman Ozdaglar · Devavrat Shah -
2013 Poster: Generalized Method-of-Moments for Rank Aggregation »
Hossein Azari Soufiani · William Z Chen · David Parkes · Lirong Xia -
2012 Workshop: Personalizing education with machine learning »
Michael Mozer · javier r movellan · Robert Lindsey · Jacob Whitehill -
2012 Poster: Learning from the Wisdom of Crowds by Minimax Entropy »
Denny Zhou · John C Platt · Sumit Basu · Yi Mao -
2012 Poster: Iterative ranking from pair-wise comparisons »
Sahand N Negahban · Sewoong Oh · Devavrat Shah -
2012 Poster: Random Utility Theory for Social Choice: Theory and Algorithms »
Hossein Azari Soufiani · David C Parkes · Lirong Xia -
2012 Poster: Variational Inference for Crowdsourcing »
Qiang Liu · Jian Peng · Alexander Ihler -
2012 Spotlight: Iterative ranking from pair-wise comparisons »
Sahand N Negahban · Sewoong Oh · Devavrat Shah -
2012 Poster: Optimal Regularized Dual Averaging Methods for Stochastic Optimization »
Xi Chen · Qihang Lin · Javier Pena -
2012 Poster: Clustering by Nonnegative Matrix Factorization Using Graph Random Walk »
Zhirong Yang · Tele Hao · Onur Dikmen · Xi Chen · Erkki Oja -
2011 Workshop: 2nd Workshop on Computational Social Science and the Wisdom of Crowds »
Winter Mason · Jennifer Wortman Vaughan · Hanna Wallach -
2011 Workshop: Relations between machine learning problems - an approach to unify the field »
Robert Williamson · John Langford · Ulrike von Luxburg · Mark Reid · Jennifer Wortman Vaughan -
2011 Poster: Iterative Learning for Reliable Crowdsourcing Systems »
David R Karger · Sewoong Oh · Devavrat Shah -
2011 Oral: Iterative Learning for Reliable Crowdsourcing Systems »
David R Karger · Sewoong Oh · Devavrat Shah -
2010 Workshop: Computational Social Science and the Wisdom of Crowds »
Jennifer Wortman Vaughan · Hanna Wallach -
2010 Spotlight: Graph-Valued Regression »
Han Liu · Xi Chen · John Lafferty · Larry Wasserman -
2010 Poster: Multivariate Dyadic Regression Trees for Sparse Learning Problems »
Han Liu · Xi Chen -
2010 Poster: Graph-Valued Regression »
Han Liu · Xi Chen · John Lafferty · Larry Wasserman -
2009 Poster: A Data-Driven Approach to Modeling Choice »
Vivek Farias · Srikanth Jagabathula · Devavrat Shah -
2009 Poster: Particle-based Variational Inference for Continuous Systems »
Alexander Ihler · Andrew Frank · Padhraic Smyth -
2009 Spotlight: A Data-Driven Approach to Modeling Choice »
Vivek Farias · Srikanth Jagabathula · Devavrat Shah -
2009 Poster: Local Rules for Global MAP: When Do They Work ? »
Kyomin Jung · Pushmeet Kohli · Devavrat Shah -
2009 Poster: Nonparametric Greedy Algorithms for the Sparse Learning Problem »
Han Liu · Xi Chen -
2009 Poster: Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise »
Jacob Whitehill · Paul L Ruvolo · Ting-fan Wu · Jacob Bergsma · javier r movellan -
2008 Poster: Inferring rankings under constrained sensing »
Srikanth Jagabathula · Devavrat Shah -
2008 Demonstration: Machine Perception for Human Machine Interaction »
Paul L Ruvolo · Marian S Bartlett · Nicholas J Butko · Claudia Lainscsek · Gwendolen C Littlewort · Jacob Whitehill · Tingfan Wu · javier r movellan -
2008 Oral: Inferring rankings under constrained sensing »
Srikanth Jagabathula · Devavrat Shah -
2007 Workshop: Machine Learning for Web Search »
Denny Zhou · Olivier Chapelle · Thorsten Joachims · Thomas Hofmann -
2007 Spotlight: Message Passing for Max-weight Independent Set »
Sujay Sanghavi · Devavrat Shah · Alan S Willsky -
2007 Spotlight: Privacy-Preserving Belief Propagation and Sampling »
Michael Kearns · Jinsong Tan · Jennifer Wortman Vaughan -
2007 Poster: Message Passing for Max-weight Independent Set »
Sujay Sanghavi · Devavrat Shah · Alan S Willsky -
2007 Poster: Privacy-Preserving Belief Propagation and Sampling »
Michael Kearns · Jinsong Tan · Jennifer Wortman Vaughan -
2007 Poster: Learning Bounds for Domain Adaptation »
John Blitzer · Yacov Crammer · Alex Kulesza · Fernando Pereira · Jennifer Wortman Vaughan -
2007 Poster: Local Algorithms for Approximate Inference in Minor-Excluded Graphs »
Kyomin Jung · Devavrat Shah -
2006 Poster: Learning from Multiple Sources »
Yacov Crammer · Michael Kearns · Jennifer Wortman Vaughan -
2006 Poster: Learning Time-Intensity Profiles of Human Activity using Non-Parametric Bayesian Models »
Alexander Ihler · Padhraic Smyth