Timezone: »
Over the recent years, reinforcement learning (RL) starts to show promising results in tackling combinatorial optimization (CO) problems, in particular when coupled with curriculum learning to facilitate training. Despite emerging empirical evidence, theoretical study on why RL helps is still at its early stage. This paper presents the first systematic study on policy optimization methods for online CO problems. We show that online CO problems can be naturally formulated as latent Markov Decision Processes (LMDPs), and prove convergence bounds on natural policy gradient (NPG) for solving LMDPs. Furthermore, our theory explains the benefit of curriculum learning: it can find a strong sampling policy and reduce the distribution shift, a critical quantity that governs the convergence rate in our theorem. For a canonical online CO problem, Secretary Problem, we formally prove that distribution shift is reduced exponentially with curriculum learning even if the curriculum is randomly generated. Our theory also shows we can simplify the curriculum learning scheme used in prior work from multi-step to single-step. Lastly, we provide extensive experiments on Secretary Problem and Online Knapsack to verify our findings.
Author Information
Runlong Zhou (Paul G. Allen School of Computer Science & Engineering, University of Washington)
Yuandong Tian (Facebook AI Research)
YI WU (UC Berkeley)
Simon Du (University of Washington)
More from the Same Authors
-
2021 : Learning Design and Construction with Varying-Sized Materials via Prioritized Memory Resets »
Yunfei Li · Lei Li · YI WU -
2022 Poster: Grounded Reinforcement Learning: Learning to Win the Game under Human Commands »
Shusheng Xu · Huaijie Wang · YI WU -
2022 Poster: Provable General Function Class Representation Learning in Multitask Bandits and MDP »
Rui Lu · Andrew Zhao · Simon Du · Gao Huang -
2022 Poster: Pre-Trained Image Encoder for Generalizable Visual Reinforcement Learning »
Zhecheng Yuan · Zhengrong Xue · Bo Yuan · Xueqian Wang · YI WU · Yang Gao · Huazhe Xu -
2022 : Efficient Planning in a Compact Latent Action Space »
zhengyao Jiang · Tianjun Zhang · Michael Janner · Yueying (Lisa) Li · Tim Rocktäschel · Edward Grefenstette · Yuandong Tian -
2022 Spotlight: Lightning Talks 5A-3 »
Minting Pan · Xiang Chen · Wenhan Huang · Can Chang · Zhecheng Yuan · Jianzhun Shao · Yushi Cao · Peihao Chen · Ke Xue · Zhengrong Xue · Zhiqiang Lou · Xiangming Zhu · Lei Li · Zhiming Li · Kai Li · Jiacheng Xu · Dongyu Ji · Ni Mu · Kun Shao · Tianpei Yang · Kunyang Lin · Ningyu Zhang · Yunbo Wang · Lei Yuan · Bo Yuan · Hongchang Zhang · Jiajun Wu · Tianze Zhou · Xueqian Wang · Ling Pan · Yuhang Jiang · Xiaokang Yang · Xiaozhuan Liang · Hao Zhang · Weiwen Hu · Miqing Li · YAN ZHENG · Matthew Taylor · Huazhe Xu · Shumin Deng · Chao Qian · YI WU · Shuncheng He · Wenbing Huang · Chuanqi Tan · Zongzhang Zhang · Yang Gao · Jun Luo · Yi Li · Xiangyang Ji · Thomas Li · Mingkui Tan · Fei Huang · Yang Yu · Huazhe Xu · Dongge Wang · Jianye Hao · Chuang Gan · Yang Liu · Luo Si · Hangyu Mao · Huajun Chen · Jianye Hao · Jun Wang · Xiaotie Deng -
2022 Spotlight: Pre-Trained Image Encoder for Generalizable Visual Reinforcement Learning »
Zhecheng Yuan · Zhengrong Xue · Bo Yuan · Xueqian Wang · YI WU · Yang Gao · Huazhe Xu -
2022 Spotlight: Lightning Talks 4A-4 »
Yunhao Tang · LING LIANG · Thomas Chau · Daeha Kim · Junbiao Cui · Rui Lu · Lei Song · Byung Cheol Song · Andrew Zhao · Remi Munos · Łukasz Dudziak · Jiye Liang · Ke Xue · Kaidi Xu · Mark Rowland · Hongkai Wen · Xing Hu · Xiaobin Huang · Simon Du · Nicholas Lane · Chao Qian · Lei Deng · Bernardo Avila Pires · Gao Huang · Will Dabney · Mohamed Abdelfattah · Yuan Xie · Marc Bellemare -
2022 Panel: Panel 4B-3: Efficient Methods for… & Understanding Deep Contrastive… »
Zhi-Hua Zhou · Yuandong Tian -
2022 Spotlight: Provable General Function Class Representation Learning in Multitask Bandits and MDP »
Rui Lu · Andrew Zhao · Simon Du · Gao Huang -
2022 Poster: When are Offline Two-Player Zero-Sum Markov Games Solvable? »
Qiwen Cui · Simon Du -
2022 Poster: Learning in Congestion Games with Bandit Feedback »
Qiwen Cui · Zhihan Xiong · Maryam Fazel · Simon Du -
2022 Poster: DreamShard: Generalizable Embedding Table Placement for Recommender Systems »
Daochen Zha · Louis Feng · Qiaoyu Tan · Zirui Liu · Kwei-Herng Lai · Bhargav Bhushanam · Yuandong Tian · Arun Kejariwal · Xia Hu -
2022 Poster: The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games »
Chao Yu · Akash Velu · Eugene Vinitsky · Jiaxuan Gao · Yu Wang · Alexandre Bayen · YI WU -
2022 Poster: Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus »
Qiwen Cui · Simon Du -
2022 Poster: On Gap-dependent Bounds for Offline Reinforcement Learning »
Xinqi Wang · Qiwen Cui · Simon Du -
2022 Poster: Understanding Deep Contrastive Learning via Coordinate-wise Optimization »
Yuandong Tian -
2022 Poster: Near-Optimal Randomized Exploration for Tabular Markov Decision Processes »
Zhihan Xiong · Ruoqi Shen · Qiwen Cui · Maryam Fazel · Simon Du -
2021 Workshop: Ecological Theory of Reinforcement Learning: How Does Task Design Influence Agent Learning? »
Manfred Díaz · Hiroki Furuta · Elise van der Pol · Lisa Lee · Shixiang (Shane) Gu · Pablo Samuel Castro · Simon Du · Marc Bellemare · Sergey Levine -
2020 Poster: Multi-Task Reinforcement Learning with Soft Modularization »
Ruihan Yang · Huazhe Xu · YI WU · Xiaolong Wang -
2018 : Poster Session »
Sujay Sanghavi · Vatsal Shah · Yanyao Shen · Tianchen Zhao · Yuandong Tian · Tomer Galanti · Mufan Li · Gilad Cohen · Daniel Rothchild · Aristide Baratin · Devansh Arpit · Vagelis Papalexakis · Michael Perlmutter · Ashok Vardhan Makkuva · Pim de Haan · Yingyan Lin · Wanmo Kang · Cheolhyoung Lee · Hao Shen · Sho Yaida · Dan Roberts · Nadav Cohen · Philippe Casgrain · Dejiao Zhang · Tengyu Ma · Avinash Ravichandran · Julian Emilio Salazar · Bo Li · Davis Liang · Christopher Wong · Glen Bigan Mbeng · Animesh Garg -
2018 Poster: Meta-Learning MCMC Proposals »
Tongzhou Wang · YI WU · Dave Moore · Stuart Russell -
2017 Poster: Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments »
Ryan Lowe · YI WU · Aviv Tamar · Jean Harb · OpenAI Pieter Abbeel · Igor Mordatch -
2016 Poster: Value Iteration Networks »
Aviv Tamar · Sergey Levine · Pieter Abbeel · YI WU · Garrett Thomas -
2016 Oral: Value Iteration Networks »
Aviv Tamar · Sergey Levine · Pieter Abbeel · YI WU · Garrett Thomas -
2014 Workshop: 3rd NIPS Workshop on Probabilistic Programming »
Daniel Roy · Josh Tenenbaum · Thomas Dietterich · Stuart J Russell · YI WU · Ulrik R Beierholm · Alp Kucukelbir · Zenna Tavares · Yura Perov · Daniel Lee · Brian Ruttenberg · Sameer Singh · Michael Hughes · Marco Gaboardi · Alexey Radul · Vikash Mansinghka · Frank Wood · Sebastian Riedel · Prakash Panangaden