Skip to yearly menu bar Skip to main content


Continual Pre-Training of Large Language Models: How to (re)warm your model?

Kshitij Gupta ⋅ Benjamin Thérien ⋅ Adam Ibrahim ⋅ Mats L Richter ⋅ Quentin Anthony ⋅ Eugene Belilovsky ⋅ Irina Rish ⋅ Timothee Lesort

Abstract

Chat is not available.