ATLAS: Actor-Critic Task-completion with Look-ahead Action Simulation
Abstract
We introduce ATLAS (Actor-Critic Task-completion with Look-ahead Action Simulation), a web agent that combines a hierarchical planner with an internal world-model to simulate action outcomes to figure out the best course of action before execution. ATLAS starts by building a "cognitive map" - a simple world-model of the environment - by performing a lightweight curiosity driven exploration of the environment. The planner proposes candidate actions; a simulator predicts their consequences in natural language; a critic analyzes the options to select the best roll-out; and a browser executor performs the chosen action. On the WebArena benchmarks, ATLAS attains state of the art success, exceeding prior reported systems by a large margin. On the WebArena-lite Benchmark; ATLAS achieves a ∼63% success score compared to 54% for the previously published state-of-the art. Unlike previous systems, our modular architecture requires no website-specific model finetuning. Ablations show sizable drops without the world-model, hierarchical planner, and lookahead-based replanner confirming their complementary roles.