Timezone: »

Is Model Ensemble Necessary? Model-based RL via a Single Model with Lipschitz Regularized Value Function
Ruijie Zheng · Xiyao Wang · Huazhe Xu · Furong Huang

Fri Dec 09 09:30 AM -- 09:45 AM (PST) @
Event URL: https://openreview.net/forum?id=61XcDdGZclp »

Probabilistic dynamics model ensemble is widely used in existing model-based reinforcement learning methods as it outperforms a single dynamics model in both asymptotic performance and sample efficiency. In this paper, we provide both practical and theoretical insights on the empirical success of the probabilistic dynamics model ensemble through the lens of Lipschitz continuity. We find that, for a value function, the stronger the Lipschitz condition is, the smaller the gap between the true dynamics- and learned dynamics-induced Bellman operators is, thus enabling the converged value function to be closer to the optimal value function. Hence, we hypothesize that the key functionality of the probabilistic dynamics model ensemble is to regularize the Lipschitz condition of the value function using generated samples. To validate this hypothesis, we devise two practical robust training mechanisms through computing the adversarial noise and regularizing the value network’s spectral norm to directly regularize the Lipschitz condition of the value functions. Empirical results show that combined with our mechanisms, model-based RL algorithms with a single dynamics model outperform those with ensemble of the probabilistic dynamics models. These findings not only support the theoretical insight, but also provide a practical solution for developing computationally efficient model-based RL algorithms.

Author Information

Ruijie Zheng (University of Maryland, College Park)
Xiyao Wang (Center for Research on Intelligent System and Engineering, Institute of Automation, CAS, University of Chinese Academy of Sciences)
Huazhe Xu (Tsinghua University)
Furong Huang (University of Maryland)

More from the Same Authors