Generative Actor-Critic
Aoyang Qin · Wei Wang · Deqian Kong · Ying Nian Wu · Song-Chun Zhu · Sirui Xie
Abstract
Conventional Reinforcement Learning (RL) algorithms, typically focused on estimating or maximizing expected returns, face challenges when online finetuning offline pretrained models. This paper introduces Generative Actor Critc (GAC), a novel framework that decouples sequential decision-making into two stages: learning a generative model of the joint distribution of trajectories and their returns, $p(\tau, y)$, and then performing decision-making via inference on this learned model. GAC offers a new perspective, framing \textit{policy evaluation} as learning this comprehensive distribution and enabling versatile \textit{policy improvement} strategies through inference. To operationalize GAC, we introduce a specific instantiation based on a latent variable model that features a continuous latent plan vector. We develop novel inference strategies for both exploitation (optimizing latent plans for expected returns) and exploration (sampling latent plans conditioned on dynamically adjusted target returns). Experiments on Gym-MuJoCo benchmarks demonstrate GAC's strong offline performance and significantly enhanced offline-to-online adaptation compared to state-of-the-art methods, even in the absence of stepwise rewards.
Chat is not available.
Successful Page Load