Skip to yearly menu bar Skip to main content


Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment

Tianhao Wu · Banghua Zhu · Ruoyu Zhang · Zhaojin Wen · Kannan Ramchandran · Jiantao Jiao

Abstract

Chat is not available.