Timezone: »
Learning effective reinforcement learning (RL) policies to solve real-world complex tasks can be quite challenging without a high-fidelity simulation environment. In most cases, we are only given imperfect simulators with simplified dynamics, which inevitably lead to severe sim-to-real gaps in RL policy learning. The recently emerged field of offline RL provides another possibility to learn policies directly from pre-collected historical data. However, to achieve reasonable performance, existing offline RL algorithms need impractically large offline data with sufficient state-action space coverage for training. This brings up a new question: is it possible to combine learning from limited real data in offline RL and unrestricted exploration through imperfect simulators in online RL to address the drawbacks of both approaches? In this study, we propose the Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning (H2O) framework to provide an affirmative answer to this question. H2O introduces a dynamics-aware policy evaluation scheme, which adaptively penalizes the Q function learning on simulated state-action pairs with large dynamics gaps, while also simultaneously allowing learning from a fixed real-world dataset. Through extensive simulation and real-world tasks, as well as theoretical analysis, we demonstrate the superior performance of H2O against other cross-domain online and offline RL algorithms. H2O provides a brand new hybrid offline-and-online RL paradigm, which can potentially shed light on future RL algorithm design for solving practical real-world tasks.
Author Information
Haoyi Niu (Tsinghua University)
shubham sharma (IIT BOMBAY)
Yiwen Qiu (Tsinghua University)
Ming Li (Tsinghua University)
Guyue Zhou (Tsinghua University)
Jianming HU (Tsinghua University)
Xianyuan Zhan (Tsinghua University, Tsinghua University)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning »
Dates n/a. Room
More from the Same Authors
-
2021 : Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations »
Haoran Xu · Xianyuan Zhan · Honglei Yin · -
2021 : Offline Reinforcement Learning with Soft Behavior Regularization »
Haoran Xu · Xianyuan Zhan · Li Jianxiong · Honglei Yin -
2021 : Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations »
Haoran Xu · Xianyuan Zhan · Honglei Yin · -
2021 : Model-Based Offline Planning with Trajectory Pruning »
Xianyuan Zhan · Xiangyu Zhu · Haoran Xu -
2022 Poster: A Policy-Guided Imitation Approach for Offline Reinforcement Learning »
Haoran Xu · Li Jiang · Li Jianxiong · Xianyuan Zhan -
2022 Poster: TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun Distillation »
Pengfei Li · Beiwen Tian · Yongliang Shi · Xiaoxue Chen · Hao Zhao · Guyue Zhou · Ya-Qin Zhang -
2022 Poster: SNAKE: Shape-aware Neural 3D Keypoint Field »
Chengliang Zhong · Peixing You · Xiaoxue Chen · Hao Zhao · Fuchun Sun · Guyue Zhou · Xiaodong Mu · Chuang Gan · Wenbing Huang -
2022 : A Versatile and Efficient Reinforcement Learning Approach for Autonomous Driving »
Guan Wang · Haoyi Niu · desheng zhu · Jianming HU · Xianyuan Zhan · Guyue Zhou -
2022 : A Versatile and Efficient Reinforcement Learning Approach for Autonomous Driving »
Guan Wang · Haoyi Niu · desheng zhu · Jianming HU · Xianyuan Zhan · Guyue Zhou -
2022 : A Versatile and Efficient Reinforcement Learning Approach for Autonomous Driving »
Guan Wang · Haoyi Niu · desheng zhu · Jianming HU · Xianyuan Zhan · Guyue Zhou -
2022 : Sparse Q-Learning: Offline Reinforcement Learning with Implicit Value Regularization »
Haoran Xu · Li Jiang · Li Jianxiong · Zhuoran Yang · Zhaoran Wang · Xianyuan Zhan -
2022 : Distance-Sensitive Offline Reinforcement Learning »
Li Jianxiong · Xianyuan Zhan · Haoran Xu · Xiangyu Zhu · Jingjing Liu · Ya-Qin Zhang -
2022 Spotlight: Lightning Talks 6A-3 »
Junyu Xie · Chengliang Zhong · Ali Ayub · Sravanti Addepalli · Harsh Rangwani · Jiapeng Tang · Yuchen Rao · Zhiying Jiang · Yuqi Wang · Xingzhe He · Gene Chou · Ilya Chugunov · Samyak Jain · Yuntao Chen · Weidi Xie · Sumukh K Aithal · Carter Fendley · Lev Markhasin · Yiqin Dai · Peixing You · Bastian Wandt · Yinyu Nie · Helge Rhodin · Felix Heide · Ji Xin · Angela Dai · Andrew Zisserman · Bi Wang · Xiaoxue Chen · Mayank Mishra · ZHAO-XIANG ZHANG · Venkatesh Babu R · Justus Thies · Ming Li · Hao Zhao · Venkatesh Babu R · Jimmy Lin · Fuchun Sun · Matthias Niessner · Guyue Zhou · Xiaodong Mu · Chuang Gan · Wenbing Huang -
2022 Spotlight: SNAKE: Shape-aware Neural 3D Keypoint Field »
Chengliang Zhong · Peixing You · Xiaoxue Chen · Hao Zhao · Fuchun Sun · Guyue Zhou · Xiaodong Mu · Chuang Gan · Wenbing Huang -
2022 Spotlight: Lightning Talks 5A-1 »
Yao Mu · Jin Zhang · Haoyi Niu · Rui Yang · Mingdong Wu · Ze Gong · shubham sharma · Chenjia Bai · Yu ("Tony") Zhang · Siyuan Li · Yuzheng Zhuang · Fangwei Zhong · Yiwen Qiu · Xiaoteng Ma · Fei Ni · Yulong Xia · Chongjie Zhang · Hao Dong · Ming Li · Zhaoran Wang · Bin Wang · Chongjie Zhang · Jianyu Chen · Guyue Zhou · Lei Han · Jianming HU · Jianye Hao · Xianyuan Zhan · Ping Luo