Timezone: »
State Advantage Weighting for Offline RL
Jiafei Lyu · aicheng Gong · Le Wan · Zongqing Lu · Xiu Li
Event URL: https://openreview.net/forum?id=2rOD_UQfvl »
We present \textit{state advantage weighting} for offline reinforcement learning (RL). In contrast to action advantage $A(s,a)$ that we commonly adopt in QSA learning, we leverage state advantage $A(s,s^\prime)$ and QSS learning for offline RL, hence decoupling the action from values. We expect the agent can get to the high-reward state and the action is determined by how the agent can get to that corresponding state. Experiments on D4RL datasets show that our proposed method can achieve remarkable performance against the common baselines. Furthermore, our method shows good generalization capability when transferring from offline to online.
We present \textit{state advantage weighting} for offline reinforcement learning (RL). In contrast to action advantage $A(s,a)$ that we commonly adopt in QSA learning, we leverage state advantage $A(s,s^\prime)$ and QSS learning for offline RL, hence decoupling the action from values. We expect the agent can get to the high-reward state and the action is determined by how the agent can get to that corresponding state. Experiments on D4RL datasets show that our proposed method can achieve remarkable performance against the common baselines. Furthermore, our method shows good generalization capability when transferring from offline to online.
Author Information
Jiafei Lyu (Tsinghua University, Tsinghua University)
aicheng Gong (Electronic Engineering, Tsinghua University, Tsinghua University)
Le Wan (Jilin University)
Zongqing Lu (Peking University)
Xiu Li
More from the Same Authors
-
2021 : MHER: Model-based Hindsight Experience Replay »
Yang Rui · Meng Fang · Lei Han · Yali Du · Feng Luo · Xiu Li -
2022 Poster: Model-Based Opponent Modeling »
XiaoPeng Yu · Jiechuan Jiang · Wanpeng Zhang · Haobin Jiang · Zongqing Lu -
2022 Poster: Learning to Share in Networked Multi-Agent Reinforcement Learning »
Yuxuan Yi · Ge Li · Yaowei Wang · Zongqing Lu -
2022 Poster: Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination »
Jiafei Lyu · Xiu Li · Zongqing Lu -
2022 Poster: OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression »
Wanhua Li · Xiaoke Huang · Zheng Zhu · Yansong Tang · Xiu Li · Jie Zhou · Jiwen Lu -
2022 Poster: I2Q: A Fully Decentralized Q-Learning Algorithm »
Jiechuan Jiang · Zongqing Lu -
2022 Poster: Mildly Conservative Q-Learning for Offline Reinforcement Learning »
Jiafei Lyu · Xiaoteng Ma · Xiu Li · Zongqing Lu -
2022 Poster: Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning »
Yuanpei Chen · Tianhao Wu · Shengjie Wang · Xidong Feng · Jiechuan Jiang · Zongqing Lu · Stephen McAleer · Hao Dong · Song-Chun Zhu · Yaodong Yang -
2022 : Emergent collective intelligence from massive-agent cooperation and competition »
Hanmo Chen · Stone Tao · JIAXIN CHEN · Weihan Shen · Xihui Li · Chenghui Yu · Sikai Cheng · Xiaolong Zhu · Xiu Li -
2022 Spotlight: Mildly Conservative Q-Learning for Offline Reinforcement Learning »
Jiafei Lyu · Xiaoteng Ma · Xiu Li · Zongqing Lu -
2022 Spotlight: Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination »
Jiafei Lyu · Xiu Li · Zongqing Lu -
2022 Spotlight: Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning »
Yuanpei Chen · Tianhao Wu · Shengjie Wang · Xidong Feng · Jiechuan Jiang · Zongqing Lu · Stephen McAleer · Hao Dong · Song-Chun Zhu · Yaodong Yang -
2020 Poster: Learning Individually Inferred Communication for Multi-Agent Cooperation »
gang Ding · Tiejun Huang · Zongqing Lu -
2020 Oral: Learning Individually Inferred Communication for Multi-Agent Cooperation »
gang Ding · Tiejun Huang · Zongqing Lu -
2019 Poster: Learning Fairness in Multi-Agent Systems »
Jiechuan Jiang · Zongqing Lu -
2018 Poster: Learning Attentional Communication for Multi-Agent Cooperation »
Jiechuan Jiang · Zongqing Lu