Skip to yearly menu bar Skip to main content


Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment

Tianhao Wu ⋅ Banghua Zhu ⋅ Ruoyu Zhang ⋅ Zhaojin Wen ⋅ Kannan Ramchandran ⋅ Jiantao Jiao

Abstract

Chat is not available.