Skip to yearly menu bar Skip to main content


Idea: Fairness Constraints as Reliability Guarantees for RLHF Reward Models

Advay Samnerkar · Sagnik Bhattacharya · Kailash Ranganathan · Ashwinee Panda · Kevin Zhu

Abstract

Chat is not available.