NeurIPS Poster PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models

Spotlight Poster

PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models

Fanxu Meng · Zhaohui Wang · Muhan Zhang

East Exhibit Hall A-C #2206

[ Abstract ]

[ Paper] [ Slides] [ Poster] [ OpenReview]

Thu 12 Dec 11 a.m. PST — 2 p.m. PST

Abstract: To parameter-efficiently fine-tune (PEFT) large language models (LLMs), the low-rank adaptation (LoRA) method approximates the model changes

Δ W \in R^{m \times n}

$\Delta W \in \mathbb{R}^{m \times n}$ through the product of two matrices

A \in R^{m \times r}

$A \in \mathbb{R}^{m \times r}$ and

B \in R^{r \times n}

$B \in \mathbb{R}^{r \times n}$ , where

r ≪ min (m, n)

$r \ll \min(m, n)$ ,

A

$A$ is initialized with Gaussian noise, and

B

$B$ with zeros. LoRA **freezes the original model

W

$W$ ** and **updates the "Noise \& Zero" adapter**, which may lead to slow convergence. To overcome this limitation, we introduce **P**r**i**ncipal **S**ingular values and **S**ingular vectors **A**daptation (PiSSA). PiSSA shares the same architecture as LoRA, but initializes the adaptor matrices

A

$A$ and

B

$B$ with the principal components of the original matrix

W

$W$ , and put the remaining components into a residual matrix

W^{r e s} \in R^{m \times n}

$W^{res} \in \mathbb{R}^{m \times n}$ which is frozen during fine-tuning.Compared to LoRA, PiSSA **updates the principal components** while **freezing the "residual" parts**, allowing faster convergence and enhanced performance. Comparative experiments of PiSSA and LoRA across 11 different models, ranging from 184M to 70B, encompassing 5 NLG and 8 NLU tasks, reveal that PiSSA consistently outperforms LoRA under identical experimental setups. On the GSM8K benchmark, Gemma-7B fine-tuned with PiSSA achieves an accuracy of 77.7\%, surpassing LoRA's 74.53\% by 3.25\%. Due to the same architecture, PiSSA is also compatible with quantization to further reduce the memory requirement of fine-tuning. Compared to QLoRA, QPiSSA (PiSSA with 4-bit quantization) exhibits smaller quantization errors in the initial stages. Fine-tuning LLaMA-3-70B on GSM8K, QPiSSA attains an accuracy of 86.05\%, exceeding the performances of QLoRA at 81.73\%. Leveraging a fast SVD technique, PiSSA can be initialized in only a few seconds, presenting a negligible cost for transitioning from LoRA to PiSSA.

Chat is not available.