Timezone: »
Poster
A Simple Decentralized Cross-Entropy Method
Zichen Zhang · Jun Jin · Martin Jagersand · Jun Luo · Dale Schuurmans
Cross-Entropy Method (CEM) is commonly used for planning in model-based reinforcement learning (MBRL) where a centralized approach is typically utilized to update the sampling distribution based on only the top-$k$ operation's results on samples. In this paper, we show that such a centralized approach makes CEM vulnerable to local optima, thus impairing its sample efficiency. To tackle this issue, we propose Decentralized CEM (DecentCEM), a simple but effective improvement over classical CEM, by using an ensemble of CEM instances running independently from one another, and each performing a local improvement of its own sampling distribution. We provide both theoretical and empirical analysis to demonstrate the effectiveness of this simple decentralized approach. We empirically show that, compared to the classical centralized approach using either a single or even a mixture of Gaussian distributions, our DecentCEM finds the global optimum much more consistently thus improves the sample efficiency. Furthermore, we plug in our DecentCEM in the planning problem of MBRL, and evaluate our approach in several continuous control environments, with comparison to the state-of-art CEM based MBRL approaches (PETS and POPLIN). Results show sample efficiency improvement by simply replacing the classical CEM module with our DecentCEM module, while only sacrificing a reasonable amount of computational cost. Lastly, we conduct ablation studies for more in-depth analysis. Code is available at https://github.com/vincentzhang/decentCEM.
Author Information
Zichen Zhang (University of Alberta)
Jun Jin (University of Alberta)
Martin Jagersand (University of Alberta)
Jun Luo (Huawei Technologies Ltd.)
Dale Schuurmans (Google Brain & University of Alberta)
More from the Same Authors
-
2021 Spotlight: Combiner: Full Attention Transformer with Sparse Computation Cost »
Hongyu Ren · Hanjun Dai · Zihang Dai · Mengjiao (Sherry) Yang · Jure Leskovec · Dale Schuurmans · Bo Dai -
2021 : Offline Policy Selection under Uncertainty »
Mengjiao (Sherry) Yang · Bo Dai · Ofir Nachum · George Tucker · Dale Schuurmans -
2022 Poster: Multiagent Q-learning with Sub-Team Coordination »
Wenhan Huang · Kai Li · Kun Shao · Tianze Zhou · Matthew Taylor · Jun Luo · Dongge Wang · Hangyu Mao · Jianye Hao · Jun Wang · Xiaotie Deng -
2022 : Build generally reusable agent-environment interaction models »
Jun Jin · Hongming Zhang · Jun Luo -
2022 Spotlight: Lightning Talks 5A-3 »
Minting Pan · Xiang Chen · Wenhan Huang · Can Chang · Zhecheng Yuan · Jianzhun Shao · Yushi Cao · Peihao Chen · Ke Xue · Zhengrong Xue · Zhiqiang Lou · Xiangming Zhu · Lei Li · Zhiming Li · Kai Li · Jiacheng Xu · Dongyu Ji · Ni Mu · Kun Shao · Tianpei Yang · Kunyang Lin · Ningyu Zhang · Yunbo Wang · Lei Yuan · Bo Yuan · Hongchang Zhang · Jiajun Wu · Tianze Zhou · Xueqian Wang · Ling Pan · Yuhang Jiang · Xiaokang Yang · Xiaozhuan Liang · Hao Zhang · Weiwen Hu · Miqing Li · YAN ZHENG · Matthew Taylor · Huazhe Xu · Shumin Deng · Chao Qian · YI WU · Shuncheng He · Wenbing Huang · Chuanqi Tan · Zongzhang Zhang · Yang Gao · Jun Luo · Yi Li · Xiangyang Ji · Thomas Li · Mingkui Tan · Fei Huang · Yang Yu · Huazhe Xu · Dongge Wang · Jianye Hao · Chuang Gan · Yang Liu · Luo Si · Hangyu Mao · Huajun Chen · Jianye Hao · Jun Wang · Xiaotie Deng -
2022 Spotlight: Multiagent Q-learning with Sub-Team Coordination »
Wenhan Huang · Kai Li · Kun Shao · Tianze Zhou · Matthew Taylor · Jun Luo · Dongge Wang · Hangyu Mao · Jianye Hao · Jun Wang · Xiaotie Deng -
2022 Poster: Chain of Thought Imitation with Procedure Cloning »
Mengjiao (Sherry) Yang · Dale Schuurmans · Pieter Abbeel · Ofir Nachum -
2022 Poster: Optimal Scaling for Locally Balanced Proposals in Discrete Spaces »
Haoran Sun · Hanjun Dai · Dale Schuurmans -
2022 Poster: The Role of Baselines in Policy Gradient Optimization »
Jincheng Mei · Wesley Chung · Valentin Thomas · Bo Dai · Csaba Szepesvari · Dale Schuurmans -
2022 Poster: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models »
Jason Wei · Xuezhi Wang · Dale Schuurmans · Maarten Bosma · brian ichter · Fei Xia · Ed Chi · Quoc V Le · Denny Zhou -
2022 Poster: On the Global Convergence Rates of Decentralized Softmax Gradient Play in Markov Potential Games »
Runyu Zhang · Jincheng Mei · Bo Dai · Dale Schuurmans · Na Li -
2021 : Reward and State Design: Towards Learning to Teach »
Alex Lewandowski · Calarina Muslimani · Matthew Taylor · Jun Luo -
2021 : Dale Schuurmans Talk Q&A »
Dale Schuurmans -
2021 : Invited Talk: Dale Schuurmans - Understanding Deep Value Estimation »
Dale Schuurmans -
2021 Poster: Combiner: Full Attention Transformer with Sparse Computation Cost »
Hongyu Ren · Hanjun Dai · Zihang Dai · Mengjiao (Sherry) Yang · Jure Leskovec · Dale Schuurmans · Bo Dai -
2021 Poster: Understanding the Effect of Stochasticity in Policy Optimization »
Jincheng Mei · Bo Dai · Chenjun Xiao · Csaba Szepesvari · Dale Schuurmans -
2019 : Poster Spotlights »
Théophile Griveau-Billion · Rahul Singh · Zichen Zhang · Ciarán Lee · Jesse Krijthe · Grace Charles · Vira Semenova · Rahul Ladhania · Miruna Oprescu -
2019 : Coffee break, posters, and 1-on-1 discussions »
Yangyi Lu · Daniel Chen · Hongseok Namkoong · Marie Charpignon · Maja Rudolph · Amanda Coston · Julius von Kügelgen · Niranjani Prasad · Paramveer Dhillon · Yunzong Xu · Yixin Wang · Alexander Markham · David Rohde · Rahul Singh · Zichen Zhang · Negar Hassanpour · Ankit Sharma · Ciarán Lee · Jean Pouget-Abadie · Jesse Krijthe · Divyat Mahajan · Nan Rosemary Ke · Peter Wirnsberger · Vira Semenova · Dmytro Mykhaylov · Dennis Shen · Kenta Takatsu · Liyang Sun · Jeremy Yang · Alexander Franks · Pak Kan Wong · Tauhid Zaman · Shira Mitchell · min kyoung kang · Qi Yang -
2019 : Poster and Coffee Break 1 »
Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · Zhizhong Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Zichen Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova -
2019 Poster: Maximum Entropy Monte-Carlo Planning »
Chenjun Xiao · Ruitong Huang · Jincheng Mei · Dale Schuurmans · Martin Müller -
2019 Poster: Surrogate Objectives for Batch Policy Optimization in One-step Decision Making »
Minmin Chen · Ramki Gummadi · Chris Harris · Dale Schuurmans -
2019 Poster: Invertible Convolutional Flow »
Mahdi Karami · Dale Schuurmans · Jascha Sohl-Dickstein · Laurent Dinh · Daniel Duckworth -
2019 Spotlight: Invertible Convolutional Flow »
Mahdi Karami · Dale Schuurmans · Jascha Sohl-Dickstein · Laurent Dinh · Daniel Duckworth -
2018 : Off-policy Policy Optimization (Dale Schuurmans) »
Dale Schuurmans -
2017 Poster: Bridging the Gap Between Value and Policy Based Reinforcement Learning »
Ofir Nachum · Mohammad Norouzi · Kelvin Xu · Dale Schuurmans -
2017 Poster: Multi-view Matrix Factorization for Linear Dynamical System Estimation »
Mahdi Karami · Martha White · Dale Schuurmans · Csaba Szepesvari -
2016 Poster: Deep Learning Games »
Dale Schuurmans · Martin A Zinkevich -
2016 Poster: Reward Augmented Maximum Likelihood for Neural Structured Prediction »
Mohammad Norouzi · Samy Bengio · zhifeng Chen · Navdeep Jaitly · Mike Schuster · Yonghui Wu · Dale Schuurmans -
2015 Poster: Embedding Inference for Structured Multilabel Prediction »
Farzaneh Mirzazadeh · Siamak Ravanbakhsh · Nan Ding · Dale Schuurmans -
2014 Workshop: Representation and Learning Methods for Complex Outputs »
Richard Zemel · Dale Schuurmans · Kilian Q Weinberger · Yuhong Guo · Jia Deng · Francesco Dinuzzo · Hal Daumé III · Honglak Lee · Noah A Smith · Richard Sutton · Jiaqian YU · Vitaly Kuznetsov · Luke Vilnis · Hanchen Xiong · Calvin Murdock · Thomas Unterthiner · Jean-Francis Roy · Martin Renqiang Min · Hichem SAHBI · Fabio Massimo Zanzotto -
2014 Poster: Convex Deep Learning via Normalized Kernels »
Özlem Aslan · Xinhua Zhang · Dale Schuurmans -
2013 Workshop: Output Representation Learning »
Yuhong Guo · Dale Schuurmans · Richard Zemel · Samy Bengio · Yoshua Bengio · Li Deng · Dan Roth · Kilian Q Weinberger · Jason Weston · Kihyuk Sohn · Florent Perronnin · Gabriel Synnaeve · Pablo R Strasser · julien audiffren · Carlo Ciliberto · Dan Goldwasser -
2013 Poster: Convex Two-Layer Modeling »
Özlem Aslan · Hao Cheng · Xinhua Zhang · Dale Schuurmans -
2013 Spotlight: Convex Two-Layer Modeling »
Özlem Aslan · Hao Cheng · Xinhua Zhang · Dale Schuurmans -
2013 Poster: Polar Operators for Structured Sparse Estimation »
Xinhua Zhang · Yao-Liang Yu · Dale Schuurmans -
2012 Poster: Convex Multi-view Subspace Learning »
Martha White · Yao-Liang Yu · Xinhua Zhang · Dale Schuurmans -
2012 Poster: Accelerated Training for Matrix-norm Regularization: A Boosting Approach »
Xinhua Zhang · Yao-Liang Yu · Dale Schuurmans -
2012 Poster: A Polynomial-time Form of Robust Regression »
Yao-Liang Yu · Özlem Aslan · Dale Schuurmans -
2010 Poster: Relaxed Clipping: A Global Training Method for Robust Regression and Classification »
Yao-Liang Yu · Min Yang · Linli Xu · Martha White · Dale Schuurmans -
2009 Poster: Convex Relaxation of Mixture Regression with Efficient Algorithms »
Novi Quadrianto · Tiberio Caetano · John Lim · Dale Schuurmans -
2009 Poster: A General Projection Property for Distribution Families »
Yao-Liang Yu · Yuxi Li · Dale Schuurmans · Csaba Szepesvari -
2007 Spotlight: Stable Dual Dynamic Programming »
Tao Wang · Daniel Lizotte · Michael Bowling · Dale Schuurmans -
2007 Poster: Stable Dual Dynamic Programming »
Tao Wang · Daniel Lizotte · Michael Bowling · Dale Schuurmans -
2007 Session: Spotlights »
Dale Schuurmans -
2007 Poster: Convex Relaxations of EM »
Yuhong Guo · Dale Schuurmans -
2007 Poster: Discriminative Batch Mode Active Learning »
Yuhong Guo · Dale Schuurmans -
2006 Poster: Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields »
Chi-Hoon Lee · Shaojun Wang · Feng Jiao · Dale Schuurmans · Russell Greiner -
2006 Poster: implicit Online Learning with Kernels »
Li Cheng · Vishwanathan S V N · Dale Schuurmans · Shaojun Wang · Terry Caelli