Skip to yearly menu bar Skip to main content


Poster Fri, Dec 5, 2025 • 11:00 AM – 2:00 PM PST

Tapered Off-Policy REINFORCE - Stable and efficient reinforcement learning for large language models

Nicolas Le Roux ⋅ Marc Bellemare ⋅ Jonathan Lebensold ⋅ Arnaud Bergeron ⋅ Joshua Greaves ⋅ Alexandre Fréchette ⋅ Carolyne Pelletier ⋅ Eric Thibodeau-Laufer ⋅ Sándor Tóth ⋅ Sam Work

Abstract

Video

Chat is not available.