Skip to yearly menu bar Skip to main content


Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward

Peter Chen · Xiaopeng Li · Ziniu Li · Wotao Yin · Xi Chen · Tianyi Lin

Abstract

Chat is not available.