Skip to yearly menu bar Skip to main content


Parameter-efficient Tuning of Large-scale Multimodal Foundation Model

Haixin Wang · Xinlong Yang · Jianlong Chang · Dian Jin · Jinan Sun · Shikun Zhang · Xiao Luo · Qi Tian

Great Hall & Hall B1+B2 (level 1) #2010
[ ]
[ Paper [ Poster [ OpenReview
Thu 14 Dec 8:45 a.m. PST — 10:45 a.m. PST


Driven by the progress of large-scale pre-training, parameter-efficient transfer learning has gained immense popularity across different subfields of Artificial Intelligence. The core is to adapt the model to downstream tasks with only a small set of parameters. Recently, researchers have leveraged such proven techniques in multimodal tasks and achieve promising results. However, two critical issues remain unresolved: how to further reduce the complexity with lightweight design and how to boost alignment between modalities under extremely low parameters. In this paper, we propose A gracefUl pRompt framewOrk for cRoss-modal trAnsfer (AURORA) to overcome these challenges. Considering the redundancy in existing architectures, we first utilize the mode approximation to generate 0.1M trainable parameters to implement the multimodal parameter-efficient tuning, which explores the low intrinsic dimension with only 0.04% parameters of the pre-trained model. Then, for better modality alignment, we propose the Informative Context Enhancement and Gated Query Transformation module under extremely few parameters scenes. A thorough evaluation on six cross-modal benchmarks shows that it not only outperforms the state-of-the-art but even outperforms the full fine-tuning approach. Our code is available at:

Chat is not available.