Skip to yearly menu bar Skip to main content


Idea: Fairness Constraints as Reliability Guarantees for RLHF Reward Models

Advay Samnerkar ⋅ Sagnik Bhattacharya ⋅ Kailash Ranganathan ⋅ Ashwinee Panda ⋅ Kevin Zhu

Abstract

Chat is not available.