Poster

Parallelizing Model-based Reinforcement Learning Over the Sequence Length

Zirui Wang · Yue DENG · Junfeng Long · Yin Zhang

2024 Poster

[ Paper] [ Slides] [ Poster] [ OpenReview]

Abstract

Recently, Model-based Reinforcement Learning (MBRL) methods have demonstrated stunning sample efficiency in various RL domains.However, achieving this extraordinary sample efficiency comes with additional training costs in terms of computations, memory, and training time.To address these challenges, we propose the Parallelized Model-based Reinforcement Learning (PaMoRL) framework.PaMoRL introduces two novel techniques: the Parallel World Model (PWM) and the Parallelized Eligibility Trace Estimation (PETE) to parallelize both model learning and policy learning stages of current MBRL methods over the sequence length.Our PaMoRL framework is hardware-efficient and stable, and it can be applied to various tasks with discrete or continuous action spaces using a single set of hyperparameters.The empirical results demonstrate that the PWM and PETE within PaMoRL significantly increase training speed without sacrificing inference efficiency.In terms of sample efficiency, PaMoRL maintains an MBRL-level sample efficiency that outperforms other no-look-ahead MBRL methods and model-free RL methods, and it even exceeds the performance of planning-based MBRL methods and methods with larger networks in certain tasks.

Video

Chat is not available.