Skip to yearly menu bar Skip to main content


Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO

Peter Chen · Xiaopeng Li · Ziniu Li · Xi Chen · Tianyi Lin

Abstract

Chat is not available.