Poster
in
Workshop: Multimodal Algorithmic Reasoning Workshop Sun, Dec 7, 2025 • 4:05 PM – 5:00 PM PST

DEPART: Hierarchical Multi-Agent System for Multi-Turn Interaction

Hao-Lun Hsu · Jing Xu · Nikhil Vichare · Francesco Carbone · Miroslav Pajic · Giuseppe Carenini

Abstract

Large Language Models (LLMs) perform well on short-horizon tasks but struggle with long-horizon, multimodal scenarios that require multi-step reasoning, perception, and adaptive planning. We identify two key challenges in these settings: the difficulty of long-term coordination between planning and execution within single-agent architectures and the inefficiency of indiscriminate visual grounding. To address this, we propose \textbf{DEPART}, a hierarchical multi-agent framework that separates planning, perception, and execution across three specialized agents. These agents collaborate through an iterative loop: \textbf{D}ivide, \textbf{E}valuate, \textbf{P}lan, \textbf{A}ct, \textbf{R}eflect, and \textbf{T}rack, supporting dynamic task decomposition, selective vision use, and feedback-driven control. Evaluated on two web-based benchmarks, DEPART outperforms strong baselines, including agents enhanced with reinforcement learning, while improving efficiency through dynamic vision invocation. Our results highlight the value of modular, agent reasoning in complex multimodal tasks.

Chat is not available.