Timezone: »
CLaP: Conditional Latent Planners for Offline Reinforcement Learning
Harry Shin · Rose Wang
Event URL: https://openreview.net/forum?id=OQP7leJkAu »
Recent work has formulated offline reinforcement learning (RL) as a sequence modeling problem, benefiting from the simplicity and scalability of the Transformer architecture. However, sequence models struggle to model trajectories that are long-horizon or involve complicated environment dynamics. We propose CLaP (Conditional Latent Planners) to learn a simple goal-conditioned latent space from offline agent behavior, and incrementally decode good actions from a latent plan. We evaluate our method on continuous control domains from the D4RL benchmark. Compared to non-sequential models and return-conditioned sequential models, CLaP shows competitive if not better performance across continuous control tasks. It particularly does better in environments with complex transition dynamics with up to $+149.8\%$ performance increase. Our results suggest that decision-making is easier with simplified latent dynamics that models behavior as being goal-conditioned.
Recent work has formulated offline reinforcement learning (RL) as a sequence modeling problem, benefiting from the simplicity and scalability of the Transformer architecture. However, sequence models struggle to model trajectories that are long-horizon or involve complicated environment dynamics. We propose CLaP (Conditional Latent Planners) to learn a simple goal-conditioned latent space from offline agent behavior, and incrementally decode good actions from a latent plan. We evaluate our method on continuous control domains from the D4RL benchmark. Compared to non-sequential models and return-conditioned sequential models, CLaP shows competitive if not better performance across continuous control tasks. It particularly does better in environments with complex transition dynamics with up to $+149.8\%$ performance increase. Our results suggest that decision-making is easier with simplified latent dynamics that models behavior as being goal-conditioned.
Author Information
Harry Shin (Stanford University)
Rose Wang (Stanford)
More from the Same Authors
-
2022 : In the ZONE: Measuring difficulty and progression in curriculum generation »
Rose Wang · Jesse Mu · Dilip Arumugam · Natasha Jaques · Noah Goodman -
2022 Poster: ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward »
Zixian Ma · Rose Wang · Fei-Fei Li · Michael Bernstein · Ranjay Krishna -
2020 Workshop: Resistance AI Workshop »
Suzanne Kite · Mattie Tesfaldet · J Khadijah Abdurahman · William Agnew · Elliot Creager · Agata Foryciarz · Raphael Gontijo Lopes · Pratyusha Kalluri · Marie-Therese Png · Manuel Sabin · Maria Skoularidou · Ramon Vilarino · Rose Wang · Sayash Kapoor · Micah Carroll