Skip to yearly menu bar Skip to main content


Workshop

Evaluating Evaluations: Examining Best Practices for Measuring Broader Impacts of Generative AI

Avijit Ghosh · Usman Gohar · Yacine Jernite · Lucie-Aimée Kaffee · Alberto Lusoli · Jennifer Mickel · Irene Solaiman · Arjun Subramonian · Zeerak Talat

MTG 16

Sun 15 Dec, 8:15 a.m. PST

Generative AI systems are becoming increasingly prevalent in society across modalities, producing content such as text, images, audio, and video, with far-reaching implications. The NeurIPS Broader Impact statement has notably shifted norms for AI publications to consider negative societal impact. However, no standard exists for how to approach these impact assessments. While new methods for evaluation of social impact are being developed, including notably through the NeurIPS Datasets and Benchmarks track, the lack of standard for documenting their applicability, utility, and disparate coverage of different social impact categories stand in the way of broad adoption by developers and researchers of generative AI systems. By bringing together experts on the science and context of evaluation and practitioners who develop and analyze technical systems, we aim to help address this issue through the work of the NeurIPS community.

Live content is unavailable. Log in and register to view live content