Skip to yearly menu bar Skip to main content


Probe-Rewrite-Evaluate: A Workflow for Reliable Benchmarks and Quantifying Evaluation Awareness

Lang Xiong ⋅ Nishant Bhargava ⋅ Jeremy Chang ⋅ Jianhang Hong ⋅ Haihao Liu ⋅ Vasu Sharma ⋅ Kevin Zhu

Abstract

Video

Chat is not available.