Timezone: »
Modern deep learning systems require huge data sets to achieve impressive performance, but there is little guidance on how much or what kind of data to collect. Over-collecting data incurs unnecessary present costs, while under-collecting may incur future costs and delay workflows. We propose a new paradigm for modeling the data collection workflow as a formal optimal data collection problem that allows designers to specify performance targets, collection costs, a time horizon, and penalties for failing to meet the targets. Additionally, this formulation generalizes to tasks requiring multiple data sources, such as labeled and unlabeled data used in semi-supervised learning. To solve our problem, we develop Learn-Optimize-Collect (LOC), which minimizes expected future collection costs. Finally, we numerically compare our framework to the conventional baseline of estimating data requirements by extrapolating from neural scaling laws. We significantly reduce the risks of failing to meet desired performance targets on several classification, segmentation, and detection tasks, while maintaining low total collection costs.
Author Information
Rafid Mahmood (NVIDIA)
James Lucas (University of Toronto)
Jose M. Alvarez (NVIDIA)
Sanja Fidler (TTI at Chicago)
Marc Law (NVIDIA)
More from the Same Authors
-
2021 Spotlight: Ultrahyperbolic Neural Networks »
Marc Law -
2022 Poster: Structural Pruning via Latency-Saliency Knapsack »
Maying Shen · Hongxu Yin · Pavlo Molchanov · Lei Mao · Jianna Liu · Jose M. Alvarez -
2022 : How many trained neural networks are needed for influence estimation in modern deep learning? »
Sasha (Alexandre) Doubov · Tianshi Cao · David Acuna · Sanja Fidler -
2022 Spotlight: Lightning Talks 6B-2 »
Alexander Korotin · Jinyuan Jia · Weijian Deng · Shi Feng · Maying Shen · Denizalp Goktas · Fang-Yi Yu · Alexander Kolesov · Sadie Zhao · Stephen Gould · Hongxu Yin · Wenjie Qu · Liang Zheng · Evgeny Burnaev · Amy Greenwald · Neil Gong · Pavlo Molchanov · Yiling Chen · Lei Mao · Jianna Liu · Jose M. Alvarez -
2022 Spotlight: Structural Pruning via Latency-Saliency Knapsack »
Maying Shen · Hongxu Yin · Pavlo Molchanov · Lei Mao · Jianna Liu · Jose M. Alvarez -
2022 Spotlight: GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images »
Jun Gao · Tianchang Shen · Zian Wang · Wenzheng Chen · Kangxue Yin · Daiqing Li · Or Litany · Zan Gojcic · Sanja Fidler -
2022 Poster: EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations »
Ahmad Darkhalil · Dandan Shan · Bin Zhu · Jian Ma · Amlan Kar · Richard Higgins · Sanja Fidler · David Fouhey · Dima Damen -
2022 Poster: LION: Latent Point Diffusion Models for 3D Shape Generation »
xiaohui zeng · Arash Vahdat · Francis Williams · Zan Gojcic · Or Litany · Sanja Fidler · Karsten Kreis -
2022 Poster: GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images »
Jun Gao · Tianchang Shen · Zian Wang · Wenzheng Chen · Kangxue Yin · Daiqing Li · Or Litany · Zan Gojcic · Sanja Fidler -
2021 Poster: Ultrahyperbolic Neural Networks »
Marc Law -
2021 Poster: Distilling Image Classifiers in Object Detectors »
Shuxuan Guo · Jose M. Alvarez · Mathieu Salzmann -
2021 Poster: SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers »
Enze Xie · Wenhai Wang · Zhiding Yu · Anima Anandkumar · Jose M. Alvarez · Ping Luo -
2020 : Poster Session 3 (gather.town) »
Denny Wu · Chengrun Yang · Tolga Ergen · sanae lotfi · Charles Guille-Escuret · Boris Ginsburg · Hanbake Lyu · Cong Xie · David Newton · Debraj Basu · Yewen Wang · James Lucas · MAOJIA LI · Lijun Ding · Jose Javier Gonzalez Ortiz · Reyhane Askari Hemmat · Zhiqi Bu · Neal Lawton · Kiran Thekumparampil · Jiaming Liang · Lindon Roberts · Jingyi Zhu · Dongruo Zhou -
2020 Poster: Ultrahyperbolic Representation Learning »
Marc Law · Jos Stam -
2020 Poster: Regularized linear autoencoders recover the principal components, eventually »
Xuchan Bao · James Lucas · Sushant Sachdeva · Roger Grosse -
2019 : James Lucas, "Information-theoretic limitations on novel task generalization" »
James Lucas -
2019 : Break / Poster Session 1 »
Antonia Marcu · Yao-Yuan Yang · Pascale Gourdeau · Chen Zhu · Thodoris Lykouris · Jianfeng Chi · Mark Kozdoba · Arjun Nitin Bhagoji · Xiaoxia Wu · Jay Nandy · Michael T Smith · Bingyang Wen · Yuege Xie · Konstantinos Pitas · Suprosanna Shit · Maksym Andriushchenko · Dingli Yu · Gaël Letarte · Misha Khodak · Hussein Mozannar · Chara Podimata · James Foulds · Yizhen Wang · Huishuai Zhang · Ondrej Kuzelka · Alexander Levine · Nan Lu · Zakaria Mhammedi · Paul Viallard · Diana Cai · Lovedeep Gondara · James Lucas · Yasaman Mahdaviyeh · Aristide Baratin · Rishi Bommasani · Alessandro Barp · Andrew Ilyas · Kaiwen Wu · Jens Behrmann · Omar Rivasplata · Amir Nazemi · Aditi Raghunathan · Will Stephenson · Sahil Singla · Akhil Gupta · YooJung Choi · Yannic Kilcher · Clare Lyle · Edoardo Manino · Andrew Bennett · Zhi Xu · Niladri Chatterji · Emre Barut · Flavien Prost · Rodrigo Toro Icarte · Arno Blaas · Chulhee Yun · Sahin Lale · YiDing Jiang · Tharun Kumar Reddy Medini · Ashkan Rezaei · Alexander Meinke · Stephen Mell · Gary Kazantsev · Shivam Garg · Aradhana Sinha · Vishnu Lokhande · Geovani Rizk · Han Zhao · Aditya Kumar Akash · Jikai Hou · Ali Ghodsi · Matthias Hein · Tyler Sypherd · Yichen Yang · Anastasia Pentina · Pierre Gillot · Antoine Ledent · Guy Gur-Ari · Noah MacAulay · Tianzong Zhang -
2019 Poster: Lookahead Optimizer: k steps forward, 1 step back »
Michael Zhang · James Lucas · Jimmy Ba · Geoffrey E Hinton -
2019 Poster: Preventing Gradient Attenuation in Lipschitz Constrained Convolutional Networks »
Qiyang Li · Saminul Haque · Cem Anil · James Lucas · Roger Grosse · Joern-Henrik Jacobsen -
2019 Poster: Don't Blame the ELBO! A Linear VAE Perspective on Posterior Collapse »
James Lucas · George Tucker · Roger Grosse · Mohammad Norouzi -
2018 : Poster Session I »
Aniruddh Raghu · Daniel Jarrett · Kathleen Lewis · Elias Chaibub Neto · Nicholas Mastronarde · Shazia Akbar · Chun-Hung Chao · Henghui Zhu · Seth Stafford · Luna Zhang · Jen-Tang Lu · Changhee Lee · Adityanarayanan Radhakrishnan · Fabian Falck · Liyue Shen · Daniel Neil · Yusuf Roohani · Aparna Balagopalan · Brett Marinelli · Hagai Rossman · Sven Giesselbach · Jose Javier Gonzalez Ortiz · Edward De Brouwer · Byung-Hoon Kim · Rafid Mahmood · Tzu Ming Hsu · Antonio Ribeiro · Rumi Chunara · Agni Orfanoudaki · Kristen Severson · Mingjie Mai · Sonali Parbhoo · Albert Haque · Viraj Prabhu · Di Jin · Alena Harley · Geoffroy Dubourg-Felonneau · Xiaodan Hu · Maithra Raghu · Jonathan Warrell · Nelson Johansen · Wenyuan Li · Marko Järvenpää · Satya Narayan Shukla · Sarah Tan · Vincent Fortuin · Beau Norgeot · Yi-Te Hsu · Joel H Saltz · Veronica Tozzo · Andrew Miller · Guillaume Ausset · Azin Asgarian · Francesco Paolo Casale · Antoine Neuraz · Bhanu Pratap Singh Rawat · Turgay Ayer · Xinyu Li · Mehul Motani · Nathaniel Braman · Laetitia M Shao · Adrian Dalca · Hyunkwang Lee · Emma Pierson · Sandesh Ghimire · Yuji Kawai · Owen Lahav · Anna Goldenberg · Denny Wu · Pavitra Krishnaswamy · Colin Pawlowski · Arijit Ukil · Yuhui Zhang -
2017 Poster: Compression-aware Training of Deep Networks »
Jose Alvarez · Mathieu Salzmann -
2016 Poster: Learning the Number of Neurons in Deep Networks »
Jose M. Alvarez · Mathieu Salzmann