Timezone: »
Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep learning. However, these improvements through scaling alone require considerable costs in compute and energy. Here we focus on the scaling of error with dataset size and show how in theory we can break beyond power law scaling and potentially even reduce it to exponential scaling instead if we have access to a high-quality data pruning metric that ranks the order in which training examples should be discarded to achieve any pruned dataset size. We then test this improved scaling prediction with pruned dataset size empirically, and indeed observe better than power law scaling in practice on ResNets trained on CIFAR-10, SVHN, and ImageNet. Next, given the importance of finding high-quality pruning metrics, we perform the first large-scale benchmarking study of ten different data pruning metrics on ImageNet. We find most existing high performing metrics scale poorly to ImageNet, while the best are computationally intensive and require labels for every image. We therefore developed a new simple, cheap and scalable self-supervised pruning metric that demonstrates comparable performance to the best supervised metrics. Overall, our work suggests that the discovery of good data-pruning metrics may provide a viable path forward to substantially improved neural scaling laws, thereby reducing the resource costs of modern deep learning.
Author Information
Ben Sorscher (Stanford University)
Robert Geirhos (Google Brain)
Shashank Shekhar (Meta AI Research)

I am an AI Resident at FAIR, Meta AI at Montreal, Canada. Prior to this, I was a research master's student at the University of Guelph and Vector Institute in Ontario, Canada. My areas of interest are self-supervised learning, representation learning, and computer vision.
Surya Ganguli (Stanford)
Ari Morcos (Meta AI, FAIR Team)
More from the Same Authors
-
2021 Spotlight: How Well do Feature Visualizations Support Causal Understanding of CNN Activations? »
Roland S. Zimmermann · Judy Borowski · Robert Geirhos · Matthias Bethge · Thomas Wallis · Wieland Brendel -
2021 : Learning Background Invariance Improves Generalization and Robustness in Self Supervised Learning on ImageNet and Beyond »
Chaitanya Ryali · David Schwab · Ari Morcos -
2021 : ImageNet suffers from dichotomous data difficulty »
Kristof Meding · Luca Schulze Buschoff · Robert Geirhos · Felix A. Wichmann -
2022 : Unmasking the Lottery Ticket Hypothesis: Efficient Adaptive Pruning for Finding Winning Tickets »
Mansheej Paul · Feng Chen · Brett Larsen · Jonathan Frankle · Surya Ganguli · Gintare Karolina Dziugaite -
2023 Poster: Stable and low-precision training for large-scale vision-language models »
Mitchell Wortsman · Tim Dettmers · Luke Zettlemoyer · Ari Morcos · Ali Farhadi · Ludwig Schmidt -
2023 Poster: Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks »
Feng Chen · Daniel Kunin · Atsushi Yamamura · Surya Ganguli -
2023 Poster: Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution »
Mostafa Dehghani · Basil Mustafa · Josip Djolonga · Jonathan Heek · Matthias Minderer · Mathilde Caron · Andreas Steiner · Joan Puigcerver · Robert Geirhos · Ibrahim Alabdulmohsin · Avital Oliver · Piotr Padlewski · Alexey Gritsenko · Mario Lucic · Neil Houlsby -
2023 Poster: Information Geometry of the Retinal Representation Manifold »
Xuehao Ding · Dongsoo Lee · Joshua Melander · George Sivulka · Surya Ganguli · Stephen Baccus -
2023 Poster: Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression »
Allan Raventós · Mansheej Paul · Feng Chen · Surya Ganguli -
2023 Poster: PUG: Photorealistic and Semantically Controllable Synthetic Data for Representation Learning »
Florian Bordes · Shashank Shekhar · Mark Ibrahim · Diane Bouchacourt · Pascal Vincent · Ari Morcos -
2023 Poster: D4: Improving LLM Pretraining via Document De-Duplication and Diversification »
Kushal Tirumala · Daniel Simig · Armen Aghajanyan · Ari Morcos -
2022 Poster: Lottery Tickets on a Data Diet: Finding Initializations with Sparse Trainable Networks »
Mansheej Paul · Brett Larsen · Surya Ganguli · Jonathan Frankle · Gintare Karolina Dziugaite -
2022 Poster: A theory of weight distribution-constrained learning »
Weishun Zhong · Ben Sorscher · Daniel Lee · Haim Sompolinsky -
2021 : Session 3 | Invited talk: Surya Ganguli, "From the geometry of high dimensional energy landscapes to optimal annealing in a dissipative many body quantum optimizer" »
Surya Ganguli · Atilim Gunes Baydin -
2021 : Out-of-distribution robustness: Limited image exposure of a four-year-old is enough to outperform ResNet-50 »
Lukas Huber · Robert Geirhos · Felix A. Wichmann -
2021 : Neural Structure Mapping For Learning Abstract Visual Analogies »
Shashank Shekhar · Graham Taylor -
2021 Poster: How Well do Feature Visualizations Support Causal Understanding of CNN Activations? »
Roland S. Zimmermann · Judy Borowski · Robert Geirhos · Matthias Bethge · Thomas Wallis · Wieland Brendel -
2021 Poster: Deep Learning on a Data Diet: Finding Important Examples Early in Training »
Mansheej Paul · Surya Ganguli · Gintare Karolina Dziugaite -
2021 Oral: Partial success in closing the gap between human and machine vision »
Robert Geirhos · Kantharaju Narayanappa · Benjamin Mitzkus · Tizian Thieringer · Matthias Bethge · Felix A. Wichmann · Wieland Brendel -
2021 Poster: Partial success in closing the gap between human and machine vision »
Robert Geirhos · Kantharaju Narayanappa · Benjamin Mitzkus · Tizian Thieringer · Matthias Bethge · Felix A. Wichmann · Wieland Brendel -
2021 Poster: Grounding inductive biases in natural images: invariance stems from variations in data »
Diane Bouchacourt · Mark Ibrahim · Ari Morcos -
2020 Poster: Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel »
Stanislav Fort · Gintare Karolina Dziugaite · Mansheej Paul · Sepideh Kharaghani · Daniel Roy · Surya Ganguli -
2020 Poster: Beyond accuracy: quantifying trial-by-trial behaviour of CNNs and humans by measuring error consistency »
Robert Geirhos · Kristof Meding · Felix A. Wichmann -
2020 Poster: Predictive coding in balanced neural networks with noise, chaos and delays »
Jonathan Kadmon · Jonathan Timcheck · Surya Ganguli -
2020 Poster: Identifying Learning Rules From Neural Network Observables »
Aran Nayebi · Sanjana Srivastava · Surya Ganguli · Daniel Yamins -
2020 Spotlight: Identifying Learning Rules From Neural Network Observables »
Aran Nayebi · Sanjana Srivastava · Surya Ganguli · Daniel Yamins -
2020 Poster: The Generalization-Stability Tradeoff In Neural Network Pruning »
Brian Bartoldson · Ari Morcos · Adrian Barbu · Gordon Erlebacher -
2020 Poster: Pruning neural networks without any data by iteratively conserving synaptic flow »
Hidenori Tanaka · Daniel Kunin · Daniel Yamins · Surya Ganguli -
2019 : Panel Session: A new hope for neuroscience »
Yoshua Bengio · Blake Richards · Timothy Lillicrap · Ila Fiete · David Sussillo · Doina Precup · Konrad Kording · Surya Ganguli -
2019 : Contributed Session - Spotlight Talks »
Jonathan Frankle · David Schwab · Ari Morcos · Qianli Ma · Yao-Hung Hubert Tsai · Ruslan Salakhutdinov · YiDing Jiang · Dilip Krishnan · Hossein Mobahi · Samy Bengio · Sho Yaida · Muqiao Yang -
2019 : Poster Session »
Pravish Sainath · Mohamed Akrout · Charles Delahunt · Nathan Kutz · Guangyu Robert Yang · Joseph Marino · L F Abbott · Nicolas Vecoven · Damien Ernst · andrew warrington · Michael Kagan · Kyunghyun Cho · Kameron Harris · Leopold Grinberg · John J. Hopfield · Dmitry Krotov · Taliah Muhammad · Erick Cobos · Edgar Walker · Jacob Reimer · Andreas Tolias · Alexander Ecker · Janaki Sheth · Yu Zhang · Maciej Wołczyk · Jacek Tabor · Szymon Maszke · Roman Pogodin · Dane Corneil · Wulfram Gerstner · Baihan Lin · Guillermo Cecchi · Jenna M Reinen · Irina Rish · Guillaume Bellec · Darjan Salaj · Anand Subramoney · Wolfgang Maass · Yueqi Wang · Ari Pakman · Jin Hyung Lee · Liam Paninski · Bryan Tripp · Colin Graber · Alex Schwing · Luke Prince · Gabriel Ocker · Michael Buice · Benjamin Lansdell · Konrad Kording · Jack Lindsey · Terrence Sejnowski · Matthew Farrell · Eric Shea-Brown · Nicolas Farrugia · Victor Nepveu · Jiwoong Im · Kristin Branson · Brian Hu · Ramakrishnan Iyer · Stefan Mihalas · Sneha Aenugu · Hananel Hazan · Sihui Dai · Tan Nguyen · Doris Tsao · Richard Baraniuk · Anima Anandkumar · Hidenori Tanaka · Aran Nayebi · Stephen Baccus · Surya Ganguli · Dean Pospisil · Eilif Muller · Jeffrey S Cheng · Gaël Varoquaux · Kamalaker Dadi · Dimitrios C Gklezakos · Rajesh PN Rao · Anand Louis · Christos Papadimitriou · Santosh Vempala · Naganand Yadati · Daniel Zdeblick · Daniela M Witten · Nicholas Roberts · Vinay Prabhu · Pierre Bellec · Poornima Ramesh · Jakob H Macke · Santiago Cadena · Guillaume Bellec · Franz Scherr · Owen Marschall · Robert Kim · Hannes Rapp · Marcio Fonseca · Oliver Armitage · Jiwoong Im · Thomas Hardcastle · Abhishek Sharma · Wyeth Bair · Adrian Valente · Shane Shang · Merav Stern · Rutuja Patil · Peter Wang · Sruthi Gorantla · Peter Stratton · Tristan Edwards · Jialin Lu · Martin Ester · Yurii Vlasov · Siavash Golkar -
2019 : Panel - The Role of Communication at Large: Aparna Lakshmiratan, Jason Yosinski, Been Kim, Surya Ganguli, Finale Doshi-Velez »
Aparna Lakshmiratan · Finale Doshi-Velez · Surya Ganguli · Zachary Lipton · Michela Paganini · Anima Anandkumar · Jason Yosinski -
2019 : Invited Talk: Theories for the emergence of internal representations in neural networks: from perception to navigation »
Surya Ganguli -
2019 : Lunch Break and Posters »
Xingyou Song · Elad Hoffer · Wei-Cheng Chang · Jeremy Cohen · Jyoti Islam · Yaniv Blumenfeld · Andreas Madsen · Jonathan Frankle · Sebastian Goldt · Satrajit Chatterjee · Abhishek Panigrahi · Alex Renda · Brian Bartoldson · Israel Birhane · Aristide Baratin · Niladri Chatterji · Roman Novak · Jessica Forde · YiDing Jiang · Yilun Du · Linara Adilova · Michael Kamp · Berry Weinstein · Itay Hubara · Tal Ben-Nun · Torsten Hoefler · Daniel Soudry · Hsiang-Fu Yu · Kai Zhong · Yiming Yang · Inderjit Dhillon · Jaime Carbonell · Yanqing Zhang · Dar Gilboa · Johannes Brandstetter · Alexander R Johansen · Gintare Karolina Dziugaite · Raghav Somani · Ari Morcos · Freddie Kalaitzis · Hanie Sedghi · Lechao Xiao · John Zech · Muqiao Yang · Simran Kaur · Qianli Ma · Yao-Hung Hubert Tsai · Ruslan Salakhutdinov · Sho Yaida · Zachary Lipton · Daniel Roy · Michael Carbin · Florent Krzakala · Lenka Zdeborová · Guy Gur-Ari · Ethan Dyer · Dilip Krishnan · Hossein Mobahi · Samy Bengio · Behnam Neyshabur · Praneeth Netrapalli · Kris Sankaran · Julien Cornebise · Yoshua Bengio · Vincent Michalski · Samira Ebrahimi Kahou · Md Rifat Arefin · Jiri Hron · Jaehoon Lee · Jascha Sohl-Dickstein · Samuel Schoenholz · David Schwab · Dongyu Li · Sang Choe · Henning Petzka · Ashish Verma · Zhichao Lin · Cristian Sminchisescu -
2019 : Surya Ganguli, Yasaman Bahri, Florent Krzakala moderated by Lenka Zdeborova »
Florent Krzakala · Yasaman Bahri · Surya Ganguli · Lenka Zdeborová · Adji Bousso Dieng · Joan Bruna -
2019 : Surya Ganguli - An analytic theory of generalization dynamics and transfer learning in deep linear networks »
Surya Ganguli -
2019 Poster: A unified theory for the origin of grid cells through the lens of pattern formation »
Ben Sorscher · Gabriel Mel · Surya Ganguli · Samuel Ocko -
2019 Poster: Universality and individuality in neural dynamics across large populations of recurrent networks »
Niru Maheswaranathan · Alex Williams · Matthew Golub · Surya Ganguli · David Sussillo -
2019 Spotlight: A unified theory for the origin of grid cells through the lens of pattern formation »
Ben Sorscher · Gabriel Mel · Surya Ganguli · Samuel Ocko -
2019 Spotlight: Universality and individuality in neural dynamics across large populations of recurrent networks »
Niru Maheswaranathan · Alex Williams · Matthew Golub · Surya Ganguli · David Sussillo -
2019 Poster: From deep learning to mechanistic understanding in neuroscience: the structure of retinal prediction »
Hidenori Tanaka · Aran Nayebi · Niru Maheswaranathan · Lane McIntosh · Stephen Baccus · Surya Ganguli -
2019 Poster: One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers »
Ari Morcos · Haonan Yu · Michela Paganini · Yuandong Tian -
2019 Poster: Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics »
Niru Maheswaranathan · Alex Williams · Matthew Golub · Surya Ganguli · David Sussillo -
2018 Poster: Generalisation in humans and deep neural networks »
Robert Geirhos · Carlos R. M. Temme · Jonas Rauber · Heiko H. Schütt · Matthias Bethge · Felix A. Wichmann -
2018 Poster: The emergence of multiple retinal cell types through efficient coding of natural movies »
Samuel Ocko · Jack Lindsey · Surya Ganguli · Stephane Deny -
2018 Poster: Insights on representational similarity in neural networks with canonical correlation »
Ari Morcos · Maithra Raghu · Samy Bengio -
2018 Poster: Statistical mechanics of low-rank tensor decomposition »
Jonathan Kadmon · Surya Ganguli -
2018 Poster: Task-Driven Convolutional Recurrent Models of the Visual System »
Aran Nayebi · Daniel Bear · Jonas Kubilius · Kohitij Kar · Surya Ganguli · David Sussillo · James J DiCarlo · Daniel Yamins -
2017 Poster: Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net »
Anirudh Goyal · Nan Rosemary Ke · Surya Ganguli · Yoshua Bengio -
2017 Poster: Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice »
Jeffrey Pennington · Samuel Schoenholz · Surya Ganguli -
2016 : Surya Ganguli : Deep Neural Models of the Retinal Response to Natural Stimuli »
Surya Ganguli -
2016 : Non-convexity in the error landscape and the expressive capacity of deep neural networks »
Surya Ganguli -
2016 Poster: Exponential expressivity in deep neural networks through transient chaos »
Ben Poole · Subhaneil Lahiri · Maithra Raghu · Jascha Sohl-Dickstein · Surya Ganguli -
2016 Poster: An equivalence between high dimensional Bayes optimal inference and M-estimation »
Madhu Advani · Surya Ganguli -
2016 Poster: Deep Learning Models of the Retinal Response to Natural Scenes »
Lane McIntosh · Niru Maheswaranathan · Aran Nayebi · Surya Ganguli · Stephen Baccus -
2015 Poster: Deep Knowledge Tracing »
Chris Piech · Jonathan Bassen · Jonathan Huang · Surya Ganguli · Mehran Sahami · Leonidas Guibas · Jascha Sohl-Dickstein -
2014 Workshop: Deep Learning and Representation Learning »
Andrew Y Ng · Yoshua Bengio · Adam Coates · Roland Memisevic · Sharanyan Chetlur · Geoffrey E Hinton · Shamim Nemati · Bryan Catanzaro · Surya Ganguli · Herbert Jaeger · Phil Blunsom · Leon Bottou · Volodymyr Mnih · Chen-Yu Lee · Rich M Schwartz -
2014 Poster: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization »
Yann N Dauphin · Razvan Pascanu · Caglar Gulcehre · Kyunghyun Cho · Surya Ganguli · Yoshua Bengio -
2013 Poster: A memory frontier for complex synapses »
Subhaneil Lahiri · Surya Ganguli -
2013 Oral: A memory frontier for complex synapses »
Subhaneil Lahiri · Surya Ganguli -
2010 Poster: Short-term memory in neuronal networks through dynamical compressed sensing »
Surya Ganguli · Haim Sompolinsky