Skip to yearly menu bar Skip to main content


RiskPO: Risk-based Policy Optimization with Verifiable Reward for LLM Post-Training

Tao Ren ⋅ Jinyang Jiang ⋅ Hui Yang ⋅ Wan Tian ⋅ Yijie Peng

Abstract

Chat is not available.