Skip to yearly menu bar Skip to main content


Poster

Initializing Variable-sized Vision Transformers from Learngene with Learnable Transformations

Shiyu Xia · Xu Yang · Yuankun Zu · Xin Geng


Abstract:

In practical scenarios, it is necessary to build variable-sized models to accommodate diverse resource constraints, with weight initialization serving as a crucial step preceding training. The recently introduced Learngene framework firstly learns one compact module, termed learngene, from a large well-trained model, and then transforms learngene to initialize variable-sized models. However, the existing Learngene methods provide limited guidance on transforming learngene, where transformation mechanisms are manually designed and generally lack a learnable component. Moreover, these methods only consider transforming learngene along depth dimension, thus constraining the flexibility of learngene. Motivated by these concerns, we propose a novel and effective Learngene approach termed LeTra (Learnable Transformations), where we transform the learngene module along both depth and width dimension with a set of learnable transformation matrices for flexible variable-sized model initialization. Specifically, we design an auxiliary model comprising the compact learngene module and learnable transformations, enabling the training of both modules. Given the differing sizes of the target models, we select specific parameters from well-trained transformations to transform learngene under the guidance of several well-designed strategies including continuous selection, step-wise selection and random selection. Extensive experiments on ImageNet-1K demonstrate that Des-Nets initialized via LeTra outperform those trained from scratch after only 1 epoch tuning. When transferring to downstream datasets, LeTra achieves better results while reducing around 30× training costs compared to training from scratch. Compared to pre-training and fine-tuning method, LeTra demonstrates superior performance while notably reducing around 9.8× initialization parameters and 3.8× pre-training costs.

Live content is unavailable. Log in and register to view live content