NeurIPS Paper 44: Evaluating ChatGPT-generated Textbook Questions using IRT

Poster
in
Workshop: Generative AI for Education (GAIED): Advances, Opportunities, and Challenges

Paper 44: Evaluating ChatGPT-generated Textbook Questions using IRT

Shreya Bhandari · Yunting Liu · Zachary Pardos

Keywords: [ Psychometric ] [ Generative AI ] [ Measurement ] [ ChatGPT ] [ education ] [ Large language models ] [ question generation ] [ Linking ] [ IRT ] [ Algebra ]

[ Abstract ] [ Project Page ]

Abstract:

We aim to test the ability of ChatGPT to generate educational assessment questions, given solely a summarization of textbook content. We take a psychometric measurement methodological approach to comparing the qualities of questions, or items, generated by ChatGPT versus gold standard questions from a published textbook. We use Item Response Theory (IRT) to analyze data from 207 test respondents answer questions from OpenStax College Algebra. Using a common item linking design, we find that ChatGPT items fared as well or better than textbook items, showing a better ability to distinguish within the moderate ability group and had higher discriminating power as compared to OpenStax items (1.92 discrimination for ChatGPT vs 1.54 discrimination for OpenStax).

Chat is not available.

Poster in Workshop: Generative AI for Education (GAIED): Advances, Opportunities, and Challenges

Paper 44: Evaluating ChatGPT-generated Textbook Questions using IRT

Shreya Bhandari · Yunting Liu · Zachary Pardos

Poster
in
Workshop: Generative AI for Education (GAIED): Advances, Opportunities, and Challenges