Skip to yearly menu bar Skip to main content


RiskPO: Risk-based Policy Optimization with Verifiable Reward for LLM Post-Training

Tao Ren · Jinyang Jiang · Hui Yang · Wan Tian · Yijie Peng

Abstract

Chat is not available.