Skip to yearly menu bar Skip to main content


Poster Wed, Dec 3, 2025 • 11:00 AM – 2:00 PM PST

Position: Benchmarking is Broken - Don't Let AI be Its Own Judge

Zerui Cheng ⋅ Stella Wohnig ⋅ Ruchika Gupta ⋅ Samiul Alam ⋅ Tassallah Abdullahi ⋅ João Alves Ribeiro ⋅ Christian Nielsen-Garcia ⋅ Saif Mir ⋅ Siran Li ⋅ Jason Orender ⋅ Seyed Ali Bahrainian ⋅ Daniel Kirste ⋅ Aaron Gokaslan ⋅ Carsten Eickhoff ⋅ Ruben Wolff

Abstract

Video

Chat is not available.