Skip to yearly menu bar Skip to main content


Spotlight Poster

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Guilherme Penedo · Hynek Kydlíček · Loubna Ben allal · Anton Lozhkov · Margaret Mitchell · Colin Raffel · Leandro Von Werra · Thomas Wolf
2024 Spotlight Poster
[ Paper

Abstract

Video

Chat is not available.