`

Timezone: »

 
How Well Does Self-Supervised Pre-Training Perform with Streaming ImageNet?
Dapeng Hu · · Qizhengqiu Lu · Lanqing Hong · Hailin Hu · Yifan Zhang · Zhenguo Li · Jiashi Feng
Event URL: https://openreview.net/forum?id=gYgMSlZznS »

The common self-supervised pre-training practice requires collecting massive unlabeled data together and then trains a representation model, dubbed joint training. However, in real-world scenarios where data are decentralized or collected in a streaming fashion, the joint training scheme is storage-heavy, time-consuming, and even infeasible. A more efficient alternative is to train a model continually with streaming data, dubbed sequential training, which, however, has not been investigated by previous works. To this end, in this paper, we conduct thorough experiments to investigate self-supervised pre-training with streaming data. Specifically, we evaluate and compare the transfer performance of self-supervised models between joint training and sequential training. We pre-train over 400 models on 4 types of pre-training streaming data from ImageNet and DomainNet, and evaluate them on 3 kinds of downstream tasks and 12 different downstream datasets. Surprisingly, we find that (1) as for self-supervised pre-training, with the help of simple data replay or parameter regularization, sequential training is promising to exhibit comparable transfer ability to joint training on various streaming data, and (2) when sequentially trained with streaming data chunks, self-supervised models have visibly less knowledge forgetting of the first data chunk than supervised models. Based on our findings, we believe sequential self-supervised training is a more efficient yet performance-competitive representation learning practice for real-world pre-training applications.

Author Information

Dapeng Hu (National University of Singapore)
Qizhengqiu Lu (Huawei Technologies Ltd.)
Lanqing Hong (Huawei Noah's Ark Lab)
Hailin Hu (Huawei Technologies Ltd.)
Yifan Zhang (National University of Singapore)
Zhenguo Li (Noah's Ark Lab, Huawei Tech Investment Co Ltd)
Jiashi Feng (UC Berkeley)

More from the Same Authors