Since its release in 2010, ImageNet has played an instrumental role in the development of deep learning architectures for computer vision, enabling neural networks to greatly outperform hand-crafted visual representations. ImageNet also quickly became the go-to benchmark for model architectures and training techniques which eventually reach far beyond image classification. Today’s models are getting close to “solving” the benchmark. Models trained on ImageNet have been used as strong initialization for numerous downstream tasks. The ImageNet dataset has even been used for tasks going way beyond its initial purpose of training classification model. It has been leveraged and reinvented for tasks such as few-shot learning, self-supervised learning and semi-supervised learning. Interesting re-creation of the ImageNet benchmark enables the evaluation of novel challenges like robustness, bias, or concept generalization. More accurate labels have been provided. About 10 years later, ImageNet symbolizes a decade of staggering advances in computer vision, deep learning, and artificial intelligence.
We believe now is a good time to discuss what’s next: Did we solve ImageNet? What are the main lessons learnt thanks to this benchmark? What should the next generation of ImageNet-like benchmarks encompass? Is language supervision a promising alternative? How can we reflect on the diverse requirements for good datasets and models, such as fairness, privacy, security, generalization, scale, and efficiency?
Mon 4:00 a.m. - 4:30 a.m.
|
Opening
(
Opening presentation
)
SlidesLive Video » Opening ceremony |
🔗 |
Mon 4:30 a.m. - 5:00 a.m.
|
Fairness and privacy aspects of ImageNet
(
Talk
)
SlidesLive Video » |
Olga Russakovsky · Kaiyu Yang 🔗 |
Mon 5:00 a.m. - 5:30 a.m.
|
OpenImages: One Dataset for Many Computer Vision Tasks
(
Talk
)
SlidesLive Video » |
Vittorio Ferrari 🔗 |
Mon 5:30 a.m. - 6:00 a.m.
|
Object recognition in machines and brains
(
Talk
)
SlidesLive Video » |
Matthias Bethge 🔗 |
Mon 6:00 a.m. - 7:00 a.m.
|
Live panel: The future of ImageNet
(
Live panel
)
SlidesLive Video » |
Matthias Bethge · Vittorio Ferrari · Olga Russakovsky 🔗 |
Mon 7:30 a.m. - 7:45 a.m.
|
Spotlight talk: ResNet strikes back: An improved training procedure in timm.
(
Oral session
)
link »
SlidesLive Video » The influential Residual Networks designed by He et al. remains the gold-standard architecture in numerous scientific publications. They typically serve as the default architecture in studies, or as baselines when new architectures are proposed. Yet there has been significant progress on best practices for training neural networks since the inception of the ResNet architecture in 2015. Novel optimization & data-augmentation have increased the effectiveness of the training recipes. In this paper, we re-evaluate the performance of the vanilla ResNet-50 when trained with a procedure that integrates such advances. We share competitive training settings and pre-trained models in the timm open-source library, with the hope that they will serve as better baselines for future work. For instance, with our more demanding training setting, a vanilla ResNet-50 reaches 80.4\% top-1 accuracy at resolution 224x224 on ImageNet-val without extra data or distillation. We also report the performance achieved with popular models with our training procedure. |
Hugo Touvron 🔗 |
Mon 7:45 a.m. - 8:45 a.m.
|
Poster session A ( Poster session ) link » | 🔗 |
Mon 8:45 a.m. - 9:15 a.m.
|
Is ImageNet Solved? Evaluating Machine Accuracy
(
Talk
)
SlidesLive Video » |
Becca Roelofs 🔗 |
Mon 9:15 a.m. - 9:45 a.m.
|
From ImageNet to Image Classification
(
Talk
)
SlidesLive Video » |
Shibani Santurkar 🔗 |
Mon 9:45 a.m. - 10:15 a.m.
|
Are we done with ImageNet?
(
Talk
)
SlidesLive Video » We ask whether recent progress on the ImageNet classification benchmark continues to represent meaningful generalization, or whether the community has started to overfit to the idiosyncrasies of its labeling procedure. We therefore develop a significantly more robust procedure for collecting human annotations of the ImageNet validation set. Using these new labels, we reassess the accuracy of recently proposed ImageNet classifiers, and find their gains to be substantially smaller than those reported on the original labels. Furthermore, we find the original ImageNet labels to no longer be the best predictors of this independently-collected set, indicating that their usefulness in evaluating vision models may be nearing an end. Nevertheless, we find our annotation procedure to have largely remedied the errors in the original labels, reinforcing ImageNet as a powerful benchmark for future research in visual recognition. |
Alexander Kolesnikov 🔗 |
Mon 10:15 a.m. - 11:15 a.m.
|
Live panel: Did we solve ImageNet?
(
Live panel
)
SlidesLive Video » |
Shibani Santurkar · Alexander Kolesnikov · Becca Roelofs 🔗 |
Mon 11:45 a.m. - 12:15 p.m.
|
Uncovering the Deep Unknowns of ImageNet Model: Challenges and Opportunties
(
Talk
)
SlidesLive Video » |
Yixuan Li 🔗 |
Mon 12:15 p.m. - 12:45 p.m.
|
ImageNet models from the trenches
(
Talk
)
SlidesLive Video » |
Ross Wightman 🔗 |
Mon 12:45 p.m. - 1:15 p.m.
|
Using ImageNet to Measure Robustness and Uncertainty
(
Talk
)
SlidesLive Video » |
Dawn Song · Dan Hendrycks 🔗 |
Mon 1:15 p.m. - 2:15 p.m.
|
Live panel: Perspectives on ImageNet.
(
Live panel
)
SlidesLive Video » |
Dawn Song · Ross Wightman · Dan Hendrycks 🔗 |
Mon 2:30 p.m. - 3:00 p.m.
|
ImageNets of "x": ImageNet's Infrastructural Impact
(
Talk
)
SlidesLive Video » |
Emily Denton · Alex Hanna 🔗 |
Mon 3:00 p.m. - 3:30 p.m.
|
Live panel: ImageNets of "x": ImageNet's Infrastructural Impact
(
Live panel
)
SlidesLive Video » |
Emily Denton · Alex Hanna 🔗 |
Mon 3:45 p.m. - 4:00 p.m.
|
Spotlight talk: Learning Background Invariance Improves Generalization and Robustness in Self Supervised Learning on ImageNet and Beyond
(
Oral session
)
link »
SlidesLive Video » Unsupervised representation learning is an important challenge in computer vision. Recent progress in self-supervised learning has demonstrated promising results in multiple visual tasks. An important ingredient in high-performing self-supervised methods is the use of data augmentation by training models to place different augmented views of the same image nearby in embedding space. However, commonly used augmentation pipelines treat images holistically, ignoring the semantic relevance of parts of an image—e.g. a subject vs. a background—which can lead to the learning of spurious correlations. Our work addresses this problem by investigating a class of simple, yet highly effective “background augmentations", which encourage models to focus on semantically-relevant content by discouraging them from focusing on image backgrounds. Through a systematic, comprehensive investigation, we show that background augmentations lead to improved generalization with substantial improvements (~1-2% on ImageNet) in performance across a spectrum of state-of-the-art self-supervised methods (MoCo-v2, BYOL, SwAV) on a variety of tasks, even allowing us to reach within 0.1% of supervised performance on ImageNet. We also find improved label efficiency with even larger performance improvements in limited label settings (up to 4.2%). Further, we find improved training efficiency, attaining a benchmark accuracy of 74.4%, outperforming many recent self-supervised learning methods trained for 800-1000 epochs, in only 100 epochs. Importantly, we also demonstrate that background augmentations boost generalization and robustness to a number of out-of-distribution settings, including the Backgrounds Challenge, natural adversarial examples, adversarial attacks, ImageNet-Renditions and ImageNet ReaL. We also make progress in completely unsupervised saliency detection, in the process of generating saliency masks that we use for background augmentations. |
Chaitanya Ryali 🔗 |
Mon 4:00 p.m. - 5:00 p.m.
|
Poster session B ( Poster session ) link » | 🔗 |
Mon 5:00 p.m. - 5:15 p.m.
|
Closing & awards
(
Workshop closing
)
SlidesLive Video » |
🔗 |