Skip to yearly menu bar Skip to main content


Rephrasing natural text data with different languages and quality levels for Large Language Model pre-training

Michael Pieler ⋅ Marco Bellagente ⋅ Hannah Teufel ⋅ Duy Phung ⋅ Nathan Cooper ⋅ Jonathan Tow ⋅ Paulo Rocha ⋅ Reshinth Adithyan ⋅ Zaid Alyafeai ⋅ Nikhil Pinnaparaju ⋅ Maksym Zhuravinskyi ⋅ Carlos Riquelme Ruiz
Keywords: Data Efficiency

Abstract

Video

Chat is not available.