Skip to yearly menu bar Skip to main content


Poster

LoQT: Low Rank Adapters for Quantized Training

Sebastian Loeschcke · Mads Toftrup · Michael Kastoryano · Serge Belongie · Vésteinn Snæbjarnarson

West Ballroom A-D #6104
[ ] [ Project Page ]
Thu 12 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract:

Training of large neural networks requires significant computational resources. Despite advances using low-rank adapters and quantization, pretraining of models such as LLMs on consumer hardware has not been possible without model sharding, offloading during training, or per-layer gradient updates. To address these limitations, we propose \ploraq, a method for efficiently training quantized models. \ploraq uses gradient-based tensor factorization to initialize low-rank trainable weight matrices that are periodically merged into quantized full-rank weight matrices. Our approach is suitable for both pretraining and fine-tuning models, which we demonstrate experimentally for language modeling and downstream task adaptation. We find that \ploraq enables efficient training of models up to 7B parameters on a consumer-grade 24GB GPU. We also demonstrate the feasibility of training a 13B parameter model using per-layer gradient updates on the same hardware.

Live content is unavailable. Log in and register to view live content