Timezone: »
Offline reinforcement learning (RL), which aims to learn an optimal policy using a previously collected static dataset, is an important paradigm of RL. Standard RL methods often perform poorly in this regime due to the function approximation errors on out-of-distribution actions. While a variety of regularization methods have been proposed to mitigate this issue, they are often constrained by policy classes with limited expressiveness that can lead to highly suboptimal solutions. In this paper, we propose representing the policy as a diffusion model, a recent class of highly-expressive deep generative models. We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy. In our approach, we learn an action-value function and we add a term maximizing action-values into the training loss of the conditional diffusion model, which results in a loss that seeks optimal actions that are near the behavior policy. We show the expressiveness of the diffusion model-based policy, and the coupling of the behavior cloning and policy improvement under the diffusion model both contribute to the outstanding performance of Diffusion-QL. We illustrate the superiority of our method compared to prior works in a simple 2D bandit example with a multimodal behavior policy. We then show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks.
Author Information
Zhendong Wang (University of Texas, Austin)
jonathan j hunt (.)
Mingyuan Zhou (University of Texas at Austin)
More from the Same Authors
-
2022 Poster: Knowledge-Aware Bayesian Deep Topic Model »
Dongsheng Wang · Yishi Xu · Miaoge Li · Zhibin Duan · Chaojie Wang · Bo Chen · Mingyuan Zhou -
2022 Poster: HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding »
Yishi Xu · Dongsheng Wang · Bo Chen · Ruiying Lu · Zhibin Duan · Mingyuan Zhou -
2022 : Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-Oriented Dialogue Systems »
Yihao Feng · Shentao Yang · Shujian Zhang · Jianguo Zhang · Caiming Xiong · Mingyuan Zhou · Huan Wang -
2022 : Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-Oriented Dialogue Systems »
Yihao Feng · Shentao Yang · Shujian Zhang · Jianguo Zhang · Caiming Xiong · Mingyuan Zhou · Huan Wang -
2022 : Hyperbolic Deep Reinforcement Learning »
Edoardo Cetin · Benjamin Chamberlain · Michael Bronstein · jonathan j hunt -
2022 Spotlight: Lightning Talks 5B-4 »
Yuezhi Yang · Zeyu Yang · Yong Lin · Yishi Xu · Linan Yue · Tao Yang · Weixin Chen · Qi Liu · Jiaqi Chen · Dongsheng Wang · Baoyuan Wu · Yuwang Wang · Hao Pan · Shengyu Zhu · Zhenwei Miao · Yan Lu · Lu Tan · Bo Chen · Yichao Du · Haoqian Wang · Wei Li · Yanqing An · Ruiying Lu · Peng Cui · Nanning Zheng · Li Wang · Zhibin Duan · Xiatian Zhu · Mingyuan Zhou · Enhong Chen · Li Zhang -
2022 Spotlight: HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding »
Yishi Xu · Dongsheng Wang · Bo Chen · Ruiying Lu · Zhibin Duan · Mingyuan Zhou -
2022 Spotlight: Lightning Talks 2A-4 »
Sarthak Mittal · Richard Grumitt · Zuoyu Yan · Lihao Wang · Dongsheng Wang · Alexander Korotin · Jiangxin Sun · Ankit Gupta · Vage Egiazarian · Tengfei Ma · Yi Zhou · Yishi Xu · Albert Gu · Biwei Dai · Chunyu Wang · Yoshua Bengio · Uros Seljak · Miaoge Li · Guillaume Lajoie · Yiqun Wang · Liangcai Gao · Lingxiao Li · Jonathan Berant · Huang Hu · Xiaoqing Zheng · Zhibin Duan · Hanjiang Lai · Evgeny Burnaev · Zhi Tang · Zhi Jin · Xuanjing Huang · Chaojie Wang · Yusu Wang · Jian-Fang Hu · Bo Chen · Chao Chen · Hao Zhou · Mingyuan Zhou -
2022 Spotlight: Knowledge-Aware Bayesian Deep Topic Model »
Dongsheng Wang · Yishi Xu · Miaoge Li · Zhibin Duan · Chaojie Wang · Bo Chen · Mingyuan Zhou -
2022 Poster: Learning to Re-weight Examples with Optimal Transport for Imbalanced Classification »
Dandan Guo · Zhuo Li · meixi zheng · He Zhao · Mingyuan Zhou · Hongyuan Zha -
2022 Poster: Adaptive Distribution Calibration for Few-Shot Learning with Hierarchical Optimal Transport »
Dandan Guo · Long Tian · He Zhao · Mingyuan Zhou · Hongyuan Zha -
2022 Poster: Alleviating "Posterior Collapse'' in Deep Topic Models via Policy Gradient »
Yewen Li · Chaojie Wang · Zhibin Duan · Dongsheng Wang · Bo Chen · Bo An · Mingyuan Zhou -
2022 Poster: A Variational Edge Partition Model for Supervised Graph Representation Learning »
Yilin He · Chaojie Wang · Hao Zhang · Bo Chen · Mingyuan Zhou -
2022 Poster: A Unified Framework for Alternating Offline Model Training and Policy Learning »
Shentao Yang · Shujian Zhang · Yihao Feng · Mingyuan Zhou -
2022 Poster: CARD: Classification and Regression Diffusion Models »
Xizewen Han · Huangjie Zheng · Mingyuan Zhou -
2021 Poster: Exploiting Chain Rule and Bayes' Theorem to Compare Probability Distributions »
Huangjie Zheng · Mingyuan Zhou -
2021 Poster: Alignment Attention by Matching Key and Query Distributions »
Shujian Zhang · Xinjie Fan · Huangjie Zheng · Korawat Tanwisuth · Mingyuan Zhou -
2021 Poster: Probabilistic Margins for Instance Reweighting in Adversarial Training »
qizhou wang · Feng Liu · Bo Han · Tongliang Liu · Chen Gong · Gang Niu · Mingyuan Zhou · Masashi Sugiyama -
2021 Poster: Convex Polytope Trees »
Mohammadreza Armandpour · Ali Sadeghian · Mingyuan Zhou -
2021 Poster: TopicNet: Semantic Graph-Guided Topic Discovery »
Zhibin Duan · Yishi Xu · Bo Chen · Dongsheng Wang · Chaojie Wang · Mingyuan Zhou -
2021 Poster: A Prototype-Oriented Framework for Unsupervised Domain Adaptation »
Korawat Tanwisuth · Xinjie Fan · Huangjie Zheng · Shujian Zhang · Hao Zhang · Bo Chen · Mingyuan Zhou -
2021 Poster: CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator »
Alek Dimitriev · Mingyuan Zhou -
2020 Poster: Bidirectional Convolutional Poisson Gamma Dynamical Systems »
wenchao chen · Chaojie Wang · Bo Chen · Yicheng Liu · Hao Zhang · Mingyuan Zhou -
2020 Poster: Implicit Distributional Reinforcement Learning »
Yuguang Yue · Zhendong Wang · Mingyuan Zhou -
2020 Poster: Deep Relational Topic Modeling via Graph Poisson Gamma Belief Network »
Chaojie Wang · Hao Zhang · Bo Chen · Dongsheng Wang · Zhengjue Wang · Mingyuan Zhou -
2020 Poster: Bayesian Attention Modules »
Xinjie Fan · Shujian Zhang · Bo Chen · Mingyuan Zhou -
2019 Poster: Variational Graph Recurrent Neural Networks »
Ehsan Hajiramezanali · Arman Hasanzadeh · Krishna Narayanan · Nick Duffield · Mingyuan Zhou · Xiaoning Qian -
2019 Poster: Semi-Implicit Graph Variational Auto-Encoders »
Arman Hasanzadeh · Ehsan Hajiramezanali · Krishna Narayanan · Nick Duffield · Mingyuan Zhou · Xiaoning Qian -
2019 Poster: The Option Keyboard: Combining Skills in Reinforcement Learning »
Andre Barreto · Diana Borsa · Shaobo Hou · Gheorghe Comanici · Eser Aygün · Philippe Hamel · Daniel Toyama · jonathan j hunt · Shibl Mourad · David Silver · Doina Precup -
2019 Poster: Poisson-Randomized Gamma Dynamical Systems »
Aaron Schein · Scott Linderman · Mingyuan Zhou · David Blei · Hanna Wallach -
2018 Poster: Nonparametric Bayesian Lomax delegate racing for survival analysis with competing risks »
Quan Zhang · Mingyuan Zhou -
2018 Poster: Deep Poisson gamma dynamical systems »
Dandan Guo · Bo Chen · Hao Zhang · Mingyuan Zhou -
2018 Poster: Dirichlet belief networks for topic structure learning »
He Zhao · Lan Du · Wray Buntine · Mingyuan Zhou -
2018 Poster: Parsimonious Bayesian deep networks »
Mingyuan Zhou -
2018 Poster: Masking: A New Perspective of Noisy Supervision »
Bo Han · Jiangchao Yao · Gang Niu · Mingyuan Zhou · Ivor Tsang · Ya Zhang · Masashi Sugiyama -
2018 Poster: Bayesian multi-domain learning for cancer subtype discovery from next-generation sequencing count data »
Ehsan Hajiramezanali · Siamak Zamani Dadaneh · Alireza Karbalayghareh · Mingyuan Zhou · Xiaoning Qian -
2016 Poster: Poisson-Gamma dynamical systems »
Aaron Schein · Hanna Wallach · Mingyuan Zhou -
2016 Oral: Poisson-Gamma dynamical systems »
Aaron Schein · Hanna Wallach · Mingyuan Zhou -
2015 Poster: The Poisson Gamma Belief Network »
Mingyuan Zhou · Yulai Cong · Bo Chen -
2014 Poster: Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling »
Mingyuan Zhou -
2012 Poster: Augment-and-Conquer Negative Binomial Processes »
Mingyuan Zhou · Lawrence Carin -
2012 Spotlight: Augment-and-Conquer Negative Binomial Processes »
Mingyuan Zhou · Lawrence Carin -
2009 Poster: Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations »
Mingyuan Zhou · Haojun Chen · John Paisley · Lu Ren · Guillermo Sapiro · Lawrence Carin -
2009 Oral: Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations »
Mingyuan Zhou · Haojun Chen · John Paisley · Lu Ren · Guillermo Sapiro · Larry Carin