Poster
The Impact of Initialization on the Finetuning Dynamics in LoRA
Soufiane Hayou · Nikhil Ghosh · Bin Yu
East Exhibit Hall A-C #4923
[
Abstract
]
Wed 11 Dec 4:30 p.m. PST
— 7:30 p.m. PST
Abstract:
In this paper, we study the role of initialization in Low Rank Adaptation (LoRA) as originally introduced in Hu et al. (2021). Essentially, to start from the pretrained model, one can either initialize $B$ to zero and $A$ to random, or vice-versa. In both cases, the product $BA$ is equal to zero at initialization, which makes finetuning starts from the pretrained model. These two initialization schemes are seemingly similar. They should in-principle yield the same performance and share the same optimal learning rate. We demonstrate that this is a *wrong intuition* and that the first scheme (of initializing $B$ to zero and $A$ to random) on average in our experiments yields better performance compared to the other scheme. Our theoretical analysis shows that the reason behind this might be that the first initialization allows the use of larger learning rates (without causing output instability) compared to the second initialization, resulting in more efficient learning of the first scheme. We validate our results with extensive experiments.
Live content is unavailable. Log in and register to view live content