Skip to yearly menu bar Skip to main content


Poster Thu, Dec 4, 2025 • 11:00 AM – 2:00 PM PST

Analyzing Similarity Metrics for Data Selection for Language Model Pretraining

Dylan Sam ⋅ Ayan Chakrabarti ⋅ Afshin Rostamizadeh ⋅ Srikumar Ramalingam ⋅ Gui Citovsky ⋅ Sanjiv Kumar

Abstract

Video

Chat is not available.