Poster
in
Workshop: Workshop on Scaling Environments for Agents Sun, Dec 7, 2025 • 12:30 PM – 1:30 PM PST

CoLLAB: A Framework for Designing Scalable Benchmarks for Agentic LLMs

Saaduddin Mahmud · Eugene Bagdasarian · Shlomo Zilberstein

Project Page [ OpenReview]

Abstract

Inspired by classic multi-agent problem-solving paradigms, we present a framework for designing scalable environments to evaluate LLM agents. The Coordinating LLM Agent Benchmark (CoLLAB) adapts the Distributed Constraint Optimization (DCOPs) framework to LLM settings. We provide a design blueprint showing how CoLLAB environments can scale along multiple dimensions—including problem-description modality, decision-making complexity, number of agents, and agent–environment interaction. We then present three case-study environments built under this framework and evaluate LLM agents on these benchmarks to analyze how scaling affects performance.

Chat is not available.