Skip to yearly menu bar Skip to main content


Poster Thu, Dec 4, 2025 • 11:00 AM – 2:00 PM PST

Measuring what Matters: Construct Validity in Large Language Model Benchmarks

Andrew M. Bean ⋅ Ryan Othniel Kearns ⋅ Angelika Romanou ⋅ Franziska Sofia Hafner ⋅ Harry Mayne ⋅ Jan Batzner ⋅ Negar Foroutan Eghlidi ⋅ Chris Schmitz ⋅ Karolina Korgul ⋅ Hunar Batra ⋅ Oishi Deb ⋅ Emma Beharry ⋅ Cornelius Emde ⋅ Thomas Foster ⋅ Anna Gausen ⋅ María Grandury ⋅ Sophia Han ⋅ Valentin Hofmann ⋅ Lujain Ibrahim ⋅ Hazel Kim ⋅ Hannah Rose Kirk ⋅ Fangru Lin ⋅ Gabrielle Liu ⋅ Lennart Luettgau ⋅ Jabez Magomere ⋅ Jonathan Rystrøm ⋅ Anna Sotnikova ⋅ Yushi Yang ⋅ Yilun Zhao ⋅ Adel Bibi ⋅ Antoine Bosselut ⋅ Ronald Clark ⋅ Arman Cohan ⋅ Jakob Foerster ⋅ Yarin Gal ⋅ Scott Hale ⋅ Deborah Raji ⋅ Christopher Summerfield ⋅ Philip Torr ⋅ Cozmin Ududec ⋅ Luc Rocher ⋅ Adam Mahdi

Abstract

Video

Chat is not available.