Oral
in
Workshop: Evaluating Evaluations: Examining Best Practices for Measuring Broader Impacts of Generative AI

GenAI Evaluation Maturity Framework (GEMF) to assess and improve GenAI Evaluations

Yilin Zhang ⋅ Frank J. Kanayet

Keywords: reliability framework difficulty representativity accuracy Generative AI diversity evaluation efficiency robustness

Abstract

We introduce a general framework to assess and improve the maturity of GenAI evaluations, across two Areas: Prompts and Labels, each with multiple dimensions. The GEMF assessment provides a report card with maturity levels across each prompt- and label- dimension, a comprehensive summary on the status of the GenAI evaluation, and suggested directions on where to improve.

Chat is not available.