Skip to yearly menu bar Skip to main content


Poster

Tapered Off-Policy REINFORCE - Stable and efficient reinforcement learning for large language models

Nicolas Le Roux ⋅ Marc Bellemare ⋅ Jonathan Lebensold ⋅ Arnaud Bergeron ⋅ Joshua Greaves ⋅ Alexandre Fréchette ⋅ Carolyne Pelletier ⋅ Eric Thibodeau-Laufer ⋅ Sándor Tóth ⋅ Sam Work
2025 Poster

Abstract

Video

Chat is not available.