CoLLAB: A Framework for Designing Scalable Benchmarks for Agentic LLMs
Saaduddin Mahmud · Eugene Bagdasarian · Shlomo Zilberstein
Abstract
Inspired by classic multi-agent problem-solving paradigms, we present a framework for designing scalable environments to evaluate LLM agents. The Coordinating LLM Agent Benchmark (CoLLAB) adapts the Distributed Constraint Optimization (DCOPs) framework to LLM settings. We provide a design blueprint showing how CoLLAB environments can scale along multiple dimensions—including problem-description modality, decision-making complexity, number of agents, and agent–environment interaction. We then present three case-study environments built under this framework and evaluate LLM agents on these benchmarks to analyze how scaling affects performance.
Chat is not available.
Successful Page Load