Poster Thu, Dec 4, 2025 • 4:30 PM – 7:30 PM PST Exhibit Hall C,D,E #1405

VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance

Mohammad Reza Taesiri · Abhijay Ghildyal · Saman Zadtootaghaj · Nabajeet Barman · Cor-Paul Bezemer

[ Poster] [ OpenReview]

Abstract

With video games leading in entertainment revenues, optimizing game development workflows is critical to the industry’s long-term success. Recent advances in vision-language models (VLMs) hold significant potential to automate and enhance various aspects of game development—particularly video game quality assurance (QA), which remains one of the most labor-intensive processes with limited automation. To effectively measure VLM performance in video game QA tasks and evaluate their ability to handle real-world scenarios, there is a clear need for standardized benchmarks, as current ones fall short in addressing this domain. To bridge this gap, we introduce VideoGameQA-Bench - a comprehensive benchmark designed to encompass a wide range of game QA activities, including visual unit testing, visual regression testing, needle-in-a-haystack, glitch detection, and bug report generation for both images and videos.

Video

Chat is not available.