Skip to yearly menu bar Skip to main content


Poster

Accelerating Self-supervised Learning Pretraining

Jinhong Lin · Cheng-En Wu · Yibing Wei · Pedro Morgado

East Exhibit Hall A-C #2202
[ ]
Fri 13 Dec 11 a.m. PST — 2 p.m. PST

Abstract:

Our work tackles the computational challenges of contrastive learning methods, particularly for the pretraining of Vision Transformers (ViTs). Despite the effectiveness of contrastive learning, the substantial computational resources required for training often hinder their practical application. To mitigate this issue, we propose an acceleration framework, leveraging ViT's unique ability to generalize across inputs of varying sequence lengths. Our method employs a mix of sequence compression strategies, including randomized token dropout and flexible patch scaling, to reduce the cost of gradient estimation and accelerate convergence. We further provide an in-depth analysis of the gradient estimation error of various acceleration strategies and their performance on downstream tasks, offering valuable insights into the trade-offs between acceleration and performance. We also propose a novel automated procedure to identify an optimal acceleration schedule that dynamically adjusts to the training progress, ensuring efficient training without sacrificing downstream performance. Our work significantly reduces the computational overhead of SSL training on the ImageNet dataset, making it more accessible to research communities and practitioners with limited computational resources. We achieve up to 4x speedup in model convergence, highlighting the potential of our methods to democratize SSL training for ViTs and other transformer-based models.

Live content is unavailable. Log in and register to view live content