Timezone: »

RORL: Robust Offline Reinforcement Learning via Conservative Smoothing
Rui Yang · Chenjia Bai · Xiaoteng Ma · Zhaoran Wang · Chongjie Zhang · Lei Han

Thu Dec 08 09:00 AM -- 11:00 AM (PST) @

Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. However, such conservatism can impair the robustness of learned policies when encountering observation deviation under realistic conditions, such as sensor errors and adversarial attacks. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset, as well as additional conservative value estimation on these states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbations.

Author Information

Rui Yang (Hong Kong University of Science and Technology)
Rui Yang

I’m a first year Ph.D. student at CSE, the Hong Kong University of Science and Technology, supervised by Prof. Tong Zhang. I received my master’s degree and bachelor’s degree from the Department of Automation at Tsinghua University. My research interests lie in deep reinforcement learning (RL), especially goal-conditioned RL, offline RL and model-based RL. I’m also interested in the application of RL algorithms to game AI and robotics.

Chenjia Bai (Shanghai AI Laboratory)
Xiaoteng Ma (Department of Automation, Tsinghua University)
Zhaoran Wang (Northwestern University)
Chongjie Zhang (Tsinghua University)
Lei Han (Tencent AI Lab)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors