Timezone: »
MuZero Unplugged presents a promising approach for offline policy learning from logged data. It conducts Monte-Carlo Tree Search (MCTS) with a learned model and leverages Reanalyze algorithm to learn purely from offline data. For good performance, MCTS requires accurate learned models and a large number of simulations, thus costing huge computing time. This paper investigates a few hypotheses where MuZero Unplugged may not work well under the offline RL settings, including 1) learning with limited data coverage; 2) learning from offline data of stochastic environments; 3) improperly parameterized models given the offline data; 4) with a low compute budget. We propose to use a regularized one-step look-ahead approach to tackle the above issues. Instead of planning with the expensive MCTS, we use the learned model to construct an advantage estimation based on a one-step rollout. Policy improvements are towards the direction that maximizes the estimated advantage with regularization of the dataset. We conduct extensive empirical studies with BSuite environments to verify the hypotheses and then run our algorithm on the RL Unplugged Atari benchmark. Experimental results show that our proposed approach achieves stable performance even with an inaccurate learned model. On the large-scale Atari benchmark, the proposed method outperforms MuZero Unplugged by 43%. Most significantly, it uses only 5.6% wall-clock time (i.e., 1 hour) compared to MuZero Unplugged (i.e., 17.8 hours) to achieve a 150% IQM normalized score with the same hardware and software stacks.
Author Information
Zichen Liu (national university of singaore, National University of Singapore)
Siyi Li (Sea AI Lab)
Wee Sun Lee (National University of Singapore)
Wee Sun Lee is a professor in the Department of Computer Science, National University of Singapore. He obtained his B.Eng from the University of Queensland in 1992 and his Ph.D. from the Australian National University in 1996. He has been a research fellow at the Australian Defence Force Academy, a fellow of the Singapore-MIT Alliance, and a visiting scientist at MIT. His research interests include machine learning, planning under uncertainty, and approximate inference. His works have won the Test of Time Award at Robotics: Science and Systems (RSS) 2021, the RoboCup Best Paper Award at International Conference on Intelligent Robots and Systems (IROS) 2015, the Google Best Student Paper Award, Uncertainty in AI (UAI) 2014 (as faculty co-author), as well as several competitions and challenges. He has been an area chair for machine learning and AI conferences such as the Neural Information Processing Systems (NeurIPS), the International Conference on Machine Learning (ICML), the AAAI Conference on Artificial Intelligence (AAAI), and the International Joint Conference on Artificial Intelligence (IJCAI). He was a program, conference and journal track co-chair for the Asian Conference on Machine Learning (ACML), and he is currently the co-chair of the steering committee of ACML.
Shuicheng Yan (Sea AI Lab)
Zhongwen Xu (Sea AI Lab)
More from the Same Authors
-
2022 Poster: Inception Transformer »
Chenyang Si · Weihao Yu · Pan Zhou · Yichen Zhou · Xinchao Wang · Shuicheng Yan -
2022 : Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models »
Xingyu Xie · Pan Zhou · Huan Li · Zhouchen Lin · Shuicheng Yan -
2022 : Win: Weight-Decay-Integrated Nesterov Acceleration for Adaptive Gradient Algorithms »
Pan Zhou · Xingyu Xie · Shuicheng Yan -
2022 : Boosting Offline Reinforcement Learning via Data Resampling »
Yang Yue · Bingyi Kang · Xiao Ma · Zhongwen Xu · Gao Huang · Shuicheng Yan -
2022 : Mutual Information Regularized Offline Reinforcement Learning »
Xiao Ma · Bingyi Kang · Zhongwen Xu · Min Lin · Shuicheng Yan -
2022 : HloEnv: A Graph Rewrite Environment for Deep Learning Compiler Optimization Research »
Chin Yang Oh · Kunhao Zheng · Bingyi Kang · Xinyi Wan · Zhongwen Xu · Shuicheng Yan · Min Lin · Yangzihao Wang -
2022 : Visual Imitation Learning with Patch Rewards »
Minghuan Liu · Tairan He · Weinan Zhang · Shuicheng Yan · Zhongwen Xu -
2022 Spotlight: Inception Transformer »
Chenyang Si · Weihao Yu · Pan Zhou · Yichen Zhou · Xinchao Wang · Shuicheng Yan -
2022 Spotlight: Lightning Talks 2B-1 »
Yehui Tang · Jian Wang · Zheng Chen · man zhou · Peng Gao · Chenyang Si · SHANGKUN SUN · Yixing Xu · Weihao Yu · Xinghao Chen · Kai Han · Hu Yu · Yulun Zhang · Chenhui Gou · Teli Ma · Yuanqi Chen · Yunhe Wang · Hongsheng Li · Jinjin Gu · Jianyuan Guo · Qiman Wu · Pan Zhou · Yu Zhu · Jie Huang · Chang Xu · Yichen Zhou · Haocheng Feng · Guodong Guo · yongbing zhang · Ziyi Lin · Feng Zhao · Ge Li · Junyu Han · Jinwei Gu · Jifeng Dai · Chao Xu · Xinchao Wang · Linghe Kong · Shuicheng Yan · Yu Qiao · Chen Change Loy · Xin Yuan · Errui Ding · Yunhe Wang · Deyu Meng · Jingdong Wang · Chongyi Li -
2022 Poster: EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine »
Jiayi Weng · Min Lin · Shengyi Huang · Bo Liu · Denys Makoviichuk · Viktor Makoviychuk · Zichen Liu · Yufan Song · Ting Luo · Yukun Jiang · Zhongwen Xu · Shuicheng Yan -
2021 : Part 4: Appendix: Proofs and Derivations »
Wee Sun Lee -
2021 : Part 3: Graph Neural Networks and Attention Networks »
Wee Sun Lee -
2021 : Part 2: Markov Decision Process »
Wee Sun Lee -
2021 Tutorial: Message Passing In Machine Learning »
Wee Sun Lee -
2021 : Part 1: Message Passing Overview and Probabilistic Graphical Models »
Wee Sun Lee -
2020 Poster: Factor Graph Neural Networks »
Zhen Zhang · Fan Wu · Wee Sun Lee -
2017 Poster: QMDP-Net: Deep Learning for Planning under Partial Observability »
Peter Karkus · David Hsu · Wee Sun Lee -
2015 Poster: Adaptive Stochastic Optimization: From Sets to Paths »
Zhan Wei Lim · David Hsu · Wee Sun Lee -
2013 Poster: DESPOT: Online POMDP Planning with Regularization »
Adhiraj Somani · Nan Ye · David Hsu · Wee Sun Lee -
2013 Poster: Learning with Invariance via Linear Functionals on Reproducing Kernel Hilbert Space »
Xinhua Zhang · Wee Sun Lee · Yee Whye Teh -
2013 Spotlight: Learning with Invariance via Linear Functionals on Reproducing Kernel Hilbert Space »
Xinhua Zhang · Wee Sun Lee · Yee Whye Teh -
2013 Poster: Active Learning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion »
Nguyen Viet Cuong · Wee Sun Lee · Nan Ye · Kian Ming Adam Chai · Hai Leong Chieu -
2011 Poster: Monte Carlo Value Iteration with Macro-Actions »
Zhan Wei Lim · David Hsu · Wee Sun Lee -
2010 Session: Oral Session 2 »
Wee Sun Lee -
2009 Poster: Conditional Random Fields with High-Order Features for Sequence Labeling »
Nan Ye · Wee Sun Lee · Hai Leong Chieu · Dan Wu -
2007 Poster: Cooled and Relaxed Survey Propagation for MRFs »
Hai Leong Chieu · Wee Sun Lee · Yee Whye Teh -
2007 Spotlight: Cooled and Relaxed Survey Propagation for MRFs »
Hai Leong Chieu · Wee Sun Lee · Yee Whye Teh -
2007 Spotlight: What makes some POMDP problems easy to approximate? »
David Hsu · Wee Sun Lee · Nan Rong -
2007 Poster: What makes some POMDP problems easy to approximate? »
David Hsu · Wee Sun Lee · Nan Rong -
2006 Poster: Hyperparameter Learning for Graph Based Semi-supervised Learning Algorithms »
Xinhua Zhang · Wee Sun Lee