Skip to yearly menu bar Skip to main content


Scaling Smart: Accelerating Large Language Model Pre-Training with Small Model Initialization

Mohammad Samragh · Iman Mirzadeh · Keivan Alizadeh-Vahid · Fartash Faghri · Minsik Cho · Moin Nabi · Devang Naik · Mehrdad Farajtabar
Keywords: Efficient Training

Abstract

Video

Chat is not available.