Skip to yearly menu bar Skip to main content


Poster

Parallelizing Model-based Reinforcement Learning Over the Sequence Length

Zirui Wang · Yue DENG · Junfeng Long · Yin Zhang

West Ballroom A-D #6506
[ ]
Wed 11 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract:

Recently, model-based reinforcement learning (MBRL) methods have demonstrated stunning sample efficiency in various RL domains.However, the extra training costs associated with extraordinary sample efficiency are three-fold: computations, memories, and training time.To address these challenges, we propose the Parallelized Model-based Reinforcement Learning (PaMoRL) framework.PaMoRL introduces two novel techniques: the Parallel World Model (PWM) and the Parallelized Eligibility Trace Estimation (PETE) to parallelize both model and policy learning stages of current MBRL methods over the sequence length.Our PaMoRL framework is hardware-efficient and stable.It can be applied to various tasks with discrete or continuous action spaces using a single set of hyperparameters.The empirical results demonstrate that the PWM and PETE within PaMoRL significantly increase training speed without sacrificing inference efficiency.In terms of sample efficiency, PaMoRL maintains an MBRL-level sample efficiency that outperforms other no-look-ahead MBRL methods and model-free RL methods, and it even exceeds the performance of planning-based MBRL methods and methods with larger networks in certain tasks.

Live content is unavailable. Log in and register to view live content