Timezone: »
Poster
Provably Efficient Neural GTD for Off-Policy Learning
Hoi-To Wai · Zhuoran Yang · Zhaoran Wang · Mingyi Hong
This paper studies a gradient temporal difference (GTD) algorithm using neural network (NN) function approximators to minimize the mean squared Bellman error (MSBE). For off-policy learning, we show that the minimum MSBE problem can be recast into a min-max optimization involving a pair of over-parameterized primal-dual NNs. The resultant formulation can then be tackled using a neural GTD algorithm. We analyze the convergence of the proposed algorithm with a 2-layer ReLU NN architecture using $m$ neurons and prove that it computes an approximate optimal solution to the minimum MSBE problem as $m \rightarrow \infty$.
Author Information
Hoi-To Wai (The Chinese University of Hong Kong)
Zhuoran Yang (Princeton)
Zhaoran Wang (Northwestern University)
Mingyi Hong (University of Minnesota)
More from the Same Authors
-
2021 : A Unified Framework to Understand Decentralized and Federated Optimization Algorithms: A Multi-Rate Feedback Control Perspective »
xinwei zhang · Mingyi Hong · Nicola Elia -
2021 : GPU-Podracer: Scalable and Elastic Library for Cloud-Native Deep Reinforcement Learning »
Xiao-Yang Liu · Zhuoran Yang · Zhaoran Wang · Anwar Walid · Jian Guo · Michael Jordan -
2021 : Exponential Family Model-Based Reinforcement Learning via Score Matching »
Gene Li · Junbo Li · Nathan Srebro · Zhaoran Wang · Zhuoran Yang -
2022 Poster: RORL: Robust Offline Reinforcement Learning via Conservative Smoothing »
Rui Yang · Chenjia Bai · Xiaoteng Ma · Zhaoran Wang · Chongjie Zhang · Lei Han -
2022 : A Unified Framework to Understand Decentralized and Federated Optimization Algorithms: A Multi-Rate Feedback Control Perspective »
xinwei zhang · Nicola Elia · Mingyi Hong -
2022 : Building Large Machine Learning Models from Small Distributed Models: A Layer Matching Approach »
xinwei zhang · Bingqing Song · Mehrdad Honarkhah · Jie Ding · Mingyi Hong -
2022 : Sparse Q-Learning: Offline Reinforcement Learning with Implicit Value Regularization »
Haoran Xu · Li Jiang · Li Jianxiong · Zhuoran Yang · Zhaoran Wang · Xianyuan Zhan -
2022 : On the Robustness of deep learning-based MRI Reconstruction to image transformations »
jinghan jia · Mingyi Hong · Yimeng Zhang · Mehmet Akcakaya · Sijia Liu -
2022 Spotlight: Lightning Talks 5A-2 »
Qiang LI · Zhiwei Xu · Jiaqi Yang · Thai Hung Le · Haoxuan Qu · Yang Li · Artyom Sorokin · Peirong Zhang · Mira Finkelstein · Nitsan levy · Chung-Yiu Yau · dapeng li · Thommen Karimpanal George · De-Chuan Zhan · Nazar Buzun · Jiajia Jiang · Li Xu · Yichuan Mo · Yujun Cai · Yuliang Liu · Leonid Pugachev · Bin Zhang · Lucy Liu · Hoi-To Wai · Liangliang Shi · Majid Abdolshah · Yoav Kolumbus · Lin Geng Foo · Junchi Yan · Mikhail Burtsev · Lianwen Jin · Yuan Zhan · Dung Nguyen · David Parkes · Yunpeng Baiia · Jun Liu · Kien Do · Guoliang Fan · Jeffrey S Rosenschein · Sunil Gupta · Sarah Keren · Svetha Venkatesh -
2022 Spotlight: RORL: Robust Offline Reinforcement Learning via Conservative Smoothing »
Rui Yang · Chenjia Bai · Xiaoteng Ma · Zhaoran Wang · Chongjie Zhang · Lei Han -
2022 Spotlight: Multi-agent Performative Prediction with Greedy Deployment and Consensus Seeking Agents »
Qiang LI · Chung-Yiu Yau · Hoi-To Wai -
2022 Spotlight: Lightning Talks 5A-1 »
Yao Mu · Jin Zhang · Haoyi Niu · Rui Yang · Mingdong Wu · Ze Gong · shubham sharma · Chenjia Bai · Yu ("Tony") Zhang · Siyuan Li · Yuzheng Zhuang · Fangwei Zhong · Yiwen Qiu · Xiaoteng Ma · Fei Ni · Yulong Xia · Chongjie Zhang · Hao Dong · Ming Li · Zhaoran Wang · Bin Wang · Chongjie Zhang · Jianyu Chen · Guyue Zhou · Lei Han · Jianming HU · Jianye Hao · Xianyuan Zhan · Ping Luo -
2022 Poster: A Stochastic Linearized Augmented Lagrangian Method for Decentralized Bilevel Optimization »
Songtao Lu · Siliang Zeng · Xiaodong Cui · Mark Squillante · Lior Horesh · Brian Kingsbury · Jia Liu · Mingyi Hong -
2022 Poster: Inducing Equilibria via Incentives: Simultaneous Design-and-Play Ensures Global Convergence »
Boyi Liu · Jiayang Li · Zhuoran Yang · Hoi-To Wai · Mingyi Hong · Yu Nie · Zhaoran Wang -
2022 Poster: Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees »
Siliang Zeng · Chenliang Li · Alfredo Garcia · Mingyi Hong -
2022 Poster: A Unifying Framework of Off-Policy General Value Function Evaluation »
Tengyu Xu · Zhuoran Yang · Zhaoran Wang · Yingbin Liang -
2022 Poster: Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL »
Fengzhuo Zhang · Boyi Liu · Kaixin Wang · Vincent Tan · Zhuoran Yang · Zhaoran Wang -
2022 Poster: Learn to Match with No Regret: Reinforcement Learning in Markov Matching Markets »
Yifei Min · Tianhao Wang · Ruitu Xu · Zhaoran Wang · Michael Jordan · Zhuoran Yang -
2022 Poster: Advancing Model Pruning via Bi-level Optimization »
Yihua Zhang · Yuguang Yao · Parikshit Ram · Pu Zhao · Tianlong Chen · Mingyi Hong · Yanzhi Wang · Sijia Liu -
2022 Poster: Distributed Optimization for Overparameterized Problems: Achieving Optimal Dimension Independent Communication Complexity »
Bingqing Song · Ioannis Tsaknakis · Chung-Yiu Yau · Hoi-To Wai · Mingyi Hong -
2022 Poster: Multi-agent Performative Prediction with Greedy Deployment and Consensus Seeking Agents »
Qiang LI · Chung-Yiu Yau · Hoi-To Wai -
2022 Poster: Exponential Family Model-Based Reinforcement Learning via Score Matching »
Gene Li · Junbo Li · Anmol Kabra · Nati Srebro · Zhaoran Wang · Zhuoran Yang -
2022 Poster: FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning »
Xiao-Yang Liu · Ziyi Xia · Jingyang Rui · Jiechao Gao · Hongyang Yang · Ming Zhu · Christina Wang · Zhaoran Wang · Jian Guo -
2021 : Contributed Talk 2: A Unified Framework to Understand Decentralized and Federated Optimization Algorithms: A Multi-Rate Feedback Control Perspective »
xinwei zhang · Mingyi Hong · Nicola Elia -
2021 Poster: Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL »
Minshuo Chen · Yan Li · Ethan Wang · Zhuoran Yang · Zhaoran Wang · Tuo Zhao -
2021 Poster: STEM: A Stochastic Two-Sided Momentum Algorithm Achieving Near-Optimal Sample and Communication Complexities for Federated Learning »
Prashant Khanduri · PRANAY SHARMA · Haibo Yang · Mingyi Hong · Jia Liu · Ketan Rajawat · Pramod Varshney -
2021 Poster: Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning »
Yingjie Fei · Zhuoran Yang · Yudong Chen · Zhaoran Wang -
2021 Poster: A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum »
Prashant Khanduri · Siliang Zeng · Mingyi Hong · Hoi-To Wai · Zhaoran Wang · Zhuoran Yang -
2021 Poster: BooVI: Provably Efficient Bootstrapped Value Iteration »
Boyi Liu · Qi Cai · Zhuoran Yang · Zhaoran Wang -
2021 Poster: Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic »
Yufeng Zhang · Siyu Chen · Zhuoran Yang · Michael Jordan · Zhaoran Wang -
2021 Poster: Offline Constrained Multi-Objective Reinforcement Learning via Pessimistic Dual Value Iteration »
Runzhe Wu · Yufeng Zhang · Zhuoran Yang · Zhaoran Wang -
2021 Poster: When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work »
Jiawei Zhang · Yushun Zhang · Mingyi Hong · Ruoyu Sun · Zhi-Quan Luo -
2021 Poster: Provably Efficient Causal Reinforcement Learning with Confounded Observational Data »
Lingxiao Wang · Zhuoran Yang · Zhaoran Wang -
2020 Poster: Pontryagin Differentiable Programming: An End-to-End Learning and Control Framework »
Wanxin Jin · Zhaoran Wang · Zhuoran Yang · Shaoshuai Mou -
2020 Poster: A Stochastic Path Integral Differential EstimatoR Expectation Maximization Algorithm »
Gersende Fort · Eric Moulines · Hoi-To Wai -
2020 Poster: Finding Second-Order Stationary Points Efficiently in Smooth Nonconvex Linearly Constrained Optimization Problems »
Songtao Lu · Meisam Razaviyayn · Bo Yang · Kejun Huang · Mingyi Hong -
2020 Poster: Understanding Gradient Clipping in Private SGD: A Geometric Perspective »
Xiangyi Chen · Steven Wu · Mingyi Hong -
2020 Poster: Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms »
Xiangyi Chen · Tiancong Chen · Haoran Sun · Steven Wu · Mingyi Hong -
2020 Spotlight: Understanding Gradient Clipping in Private SGD: A Geometric Perspective »
Xiangyi Chen · Steven Wu · Mingyi Hong -
2020 Spotlight: Finding Second-Order Stationary Points Efficiently in Smooth Nonconvex Linearly Constrained Optimization Problems »
Songtao Lu · Meisam Razaviyayn · Bo Yang · Kejun Huang · Mingyi Hong -
2020 Poster: Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory »
Yufeng Zhang · Qi Cai · Zhuoran Yang · Yongxin Chen · Zhaoran Wang -
2020 Oral: Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory »
Yufeng Zhang · Qi Cai · Zhuoran Yang · Yongxin Chen · Zhaoran Wang -
2020 Poster: End-to-End Learning and Intervention in Games »
Jiayang Li · Jing Yu · Yu Nie · Zhaoran Wang -
2020 Poster: Provably Efficient Neural Estimation of Structural Equation Models: An Adversarial Approach »
Luofeng Liao · You-Lin Chen · Zhuoran Yang · Bo Dai · Mladen Kolar · Zhaoran Wang -
2020 Poster: Dynamic Regret of Policy Optimization in Non-Stationary Environments »
Yingjie Fei · Zhuoran Yang · Zhaoran Wang · Qiaomin Xie -
2020 Poster: On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces »
Zhuoran Yang · Chi Jin · Zhaoran Wang · Mengdi Wang · Michael Jordan -
2020 Poster: Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss »
Shuang Qiu · Xiaohan Wei · Zhuoran Yang · Jieping Ye · Zhaoran Wang -
2020 Poster: Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret »
Yingjie Fei · Zhuoran Yang · Yudong Chen · Zhaoran Wang · Qiaomin Xie -
2020 Spotlight: Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret »
Yingjie Fei · Zhuoran Yang · Yudong Chen · Zhaoran Wang · Qiaomin Xie -
2019 : Poster Spotlight 2 »
Aaron Sidford · Mengdi Wang · Lin Yang · Yinyu Ye · Zuyue Fu · Zhuoran Yang · Yongxin Chen · Zhaoran Wang · Ofir Nachum · Bo Dai · Ilya Kostrikov · Dale Schuurmans · Ziyang Tang · Yihao Feng · Lihong Li · Denny Zhou · Qiang Liu · Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Simon Du · Sham Kakade · Ruosong Wang · Minshuo Chen · Tianyi Liu · Xingguo Li · Zhaoran Wang · Tuo Zhao · Philip Amortila · Doina Precup · Prakash Panangaden · Marc Bellemare -
2019 : Lunch break and poster »
Felix Sattler · Khaoula El Mekkaoui · Neta Shoham · Cheng Hong · Florian Hartmann · Boyue Li · Daliang Li · Sebastian Caldas Rivera · Jianyu Wang · Kartikeya Bhardwaj · Tribhuvanesh Orekondy · YAN KANG · Dashan Gao · Mingshu Cong · Xin Yao · Songtao Lu · JIAHUAN LUO · Shicong Cen · Peter Kairouz · Yihan Jiang · Tzu Ming Hsu · Aleksei Triastcyn · Yang Liu · Ahmed Khaled Ragab Bayoumi · Zhicong Liang · Boi Faltings · Seungwhan Moon · Suyi Li · Tao Fan · Tianchi Huang · Chunyan Miao · Hang Qi · Matthew Brown · Lucas Glass · Junpu Wang · Wei Chen · Radu Marculescu · tomer avidor · Xueyang Wu · Mingyi Hong · Ce Ju · John Rush · Ruixiao Zhang · Youchi ZHOU · Françoise Beaufays · Yingxuan Zhu · Lei Xia -
2019 Poster: Statistical-Computational Tradeoff in Single Index Models »
Lingxiao Wang · Zhuoran Yang · Zhaoran Wang -
2019 Poster: Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost »
Zhuoran Yang · Yongxin Chen · Mingyi Hong · Zhaoran Wang -
2019 Poster: Variance Reduced Policy Evaluation with Smooth Function Approximation »
Hoi-To Wai · Mingyi Hong · Zhuoran Yang · Zhaoran Wang · Kexin Tang -
2019 Poster: On the Global Convergence of (Fast) Incremental Expectation Maximization Methods »
Belhal Karimi · Hoi-To Wai · Eric Moulines · Marc Lavielle -
2019 Poster: Convergent Policy Optimization for Safe Reinforcement Learning »
Ming Yu · Zhuoran Yang · Mladen Kolar · Zhaoran Wang -
2019 Poster: ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization »
Xiangyi Chen · Sijia Liu · Kaidi Xu · Xingguo Li · Xue Lin · Mingyi Hong · David Cox -
2018 Poster: Contrastive Learning from Pairwise Measurements »
Yi Chen · Zhuoran Yang · Yuchen Xie · Zhaoran Wang -
2018 Poster: Provable Gaussian Embedding with One Observation »
Ming Yu · Zhuoran Yang · Tuo Zhao · Mladen Kolar · Zhaoran Wang -
2018 Poster: Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization »
Hoi-To Wai · Zhuoran Yang · Zhaoran Wang · Mingyi Hong -
2017 Poster: Estimating High-dimensional Non-Gaussian Multiple Index Models via Stein’s Lemma »
Zhuoran Yang · Krishnakumar Balasubramanian · Zhaoran Wang · Han Liu