Blocks, Bots, and Bottlenecks: Studying Real-time and Adaptive Multi-Agent LLM Collaboration
Abstract
Collaboration lies at the heart of human intelligence—whether brainstorming ideas, dividing responsibilities, or planning complex tasks together. Can large language models (LLMs) do the same? We introduce \mindcraft, a dynamic platform that pushes the limits of AI collaboration by combining real-time, adaptive communication with 47 powerful in-game tools that let agents act in the rich, open world of Minecraft. Alongside it, we present \minecollab, a benchmark for evaluating how well agents coordinate, plan, and execute tasks together. Our experiments reveal a striking result: LLM agents falter when collaboration demands clear and detailed communication—showing up to a 15\% performance drop when they must articulate step-by-step plans. These findings highlight that while today’s agents can act, true collaboration still hinges on mastering language as a medium for shared understanding and joint reasoning. Video demonstrations illustrating the capabilities and failure modes of our agents can be found here: \url{https://mindcraft-minecollab.github.io/index.html}