NeurIPS 2025 Competition: MMU-RAGent: Massive Multi-Modal User-Centric Retrieval Augmented Generation Benchmark
Abstract
We introduce the first competition to evaluate RAG systems on real-user queries and feedback, leverage on web-scale corpora, and support both text and video generation. Participants develop systems that respond to real-user queries, which are curated from MS MARCO Web Search and Chatbot Arena Conversations, or collected live via our RAG-Arena platform. To support retrieval at scale, we provide API access to the English subset of ClueWeb22-B and ClueWeb22-A (87M and 800M documents), along with AWS-hosted infrastructure to facilitate system deployment on our RAG-Arena platform. Systems are evaluated using a combination of human likert-scale ratings, live preference judgments via RAG-Arena, LLM-as-a-Judge, and automatic metrics. To support flexibility in system design, we accept submissions that leverage proprietary search APIs or models, alongside open-source approaches. Participants are encouraged to clearly document system components, and separate leaderboard categories ensure fair and transparent comparison across open- and close-source systems. By focusing on user needs, large-scale retrieval, and multimodal generations, this competition aims to push academic RAG research toward more scalable, and user-aligned settings.
Schedule
|
|
|
2:15 PM
|
|
2:45 PM
|
|
3:15 PM
|
|
3:30 PM
|
|
|
|
4:40 PM
|