Poster
in
Workshop: Generative AI for Education (GAIED): Advances, Opportunities, and Challenges
Paper 44: Evaluating ChatGPT-generated Textbook Questions using IRT
Shreya Bhandari · Yunting Liu · Zachary Pardos
Keywords: [ Algebra ] [ IRT ] [ Linking ] [ question generation ] [ Large language models ] [ education ] [ ChatGPT ] [ Measurement ] [ Generative AI ] [ Psychometric ]
We aim to test the ability of ChatGPT to generate educational assessment questions, given solely a summarization of textbook content. We take a psychometric measurement methodological approach to comparing the qualities of questions, or items, generated by ChatGPT versus gold standard questions from a published textbook. We use Item Response Theory (IRT) to analyze data from 207 test respondents answer questions from OpenStax College Algebra. Using a common item linking design, we find that ChatGPT items fared as well or better than textbook items, showing a better ability to distinguish within the moderate ability group and had higher discriminating power as compared to OpenStax items (1.92 discrimination for ChatGPT vs 1.54 discrimination for OpenStax).