EchoRL: Learning to Plan through Experience for Efficient Reinforcement Learning
Abstract
Recent advances in large language models (LLMs) have spurred interest in reinforcement learning (RL) as a mechanism for improving reasoning and generalization. However, existing RL-augmented LLM pipelines largely rely on reactive token-level adaptation, with limited support for planning, reuse, or efficient rollout. Motivated by the vision of an “experience era”, we present \textbf{EchoRL}, a system framework that bridges reaction and planning in real-time RL through experience-grounded infrastructure. EchoRL introduces three key innovations: (1) a latent planning optimization that enables structured rollout with continuation-based reasoning; (2) an asynchronous execution engine with KV-cache sharing and token-level dispatch; and (3) a prioritized replay system stratified into hot/cold buffers for improved RL training efficiency.