Timezone: »
Poster
Provably Efficient Q-Learning with Low Switching Cost
Yu Bai · Tengyang Xie · Nan Jiang · Yu-Xiang Wang
Tue Dec 10 10:45 AM -- 12:45 PM (PST) @ East Exhibition Hall B + C #195
We take initial steps in studying PAC-MDP algorithms with limited adaptivity, that is, algorithms that change its exploration policy as infrequently as possible during regret minimization. This is motivated by the difficulty of running fully adaptive algorithms in real-world applications (such as medical domains), and we propose to quantify adaptivity using the notion of \emph{local switching cost}. Our main contribution, Q-Learning with UCB2 exploration, is a model-free algorithm for $H$-step episodic MDP that achieves sublinear regret whose local switching cost in $K$ episodes is $O(H^3SA\log K)$, and we provide a lower bound of $\Omega(HSA)$ on the local switching cost for any no-regret algorithm. Our algorithm can be naturally adapted to the concurrent setting \citep{guo2015concurrent}, which yields nontrivial results that improve upon prior work in certain aspects.
Author Information
Yu Bai (Salesforce Research)
Tengyang Xie (University of Illinois at Urbana-Champaign)
Nan Jiang (University of Illinois at Urbana-Champaign)
Yu-Xiang Wang (UC Santa Barbara)
More from the Same Authors
-
2021 : Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning »
Cameron Voloshin · Hoang Le · Nan Jiang · Yisong Yue -
2021 Spotlight: Logarithmic Regret in Feature-based Dynamic Pricing »
Jianyu Xu · Yu-Xiang Wang -
2021 Spotlight: Understanding the Under-Coverage Bias in Uncertainty Estimation »
Yu Bai · Song Mei · Huan Wang · Caiming Xiong -
2021 : Instance-dependent Offline Reinforcement Learning: From tabular RL to linear MDPs »
Ming Yin · Yu-Xiang Wang -
2022 Poster: Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret »
Jiawei Huang · Li Zhao · Tao Qin · Wei Chen · Nan Jiang · Tie-Yan Liu -
2022 : Generalized PTR: User-Friendly Recipes for Data-Adaptive Algorithms with Differential Privacy »
Rachel Redberg · Yuqing Zhu · Yu-Xiang Wang -
2022 : VOTING-BASED APPROACHES FOR DIFFERENTIALLY PRIVATE FEDERATED LEARNING »
Yuqing Zhu · Xiang Yu · Yi-Hsuan Tsai · Francesco Pittaluga · Masoud Faraki · Manmohan Chandraker · Yu-Xiang Wang -
2022 : Offline Reinforcement Learning with Closed-Form Policy Improvement Operators »
Jiachen Li · Edwin Zhang · Ming Yin · Qinxun Bai · Yu-Xiang Wang · William Yang Wang -
2022 : Offline Policy Evaluation for Reinforcement Learning with Adaptively Collected Data »
Sunil Madhow · Dan Qiao · Yu-Xiang Wang -
2022 : Trajectory-based Explainability Framework for Offline RL »
Shripad Deshmukh · Arpan Dasgupta · Chirag Agarwal · Nan Jiang · Balaji Krishnamurthy · Georgios Theocharous · Jayakumar Subramanian -
2022 : AMORE: A Model-based Framework for Improving Arbitrary Baseline Policies with Offline Data »
Tengyang Xie · Mohak Bhardwaj · Nan Jiang · Ching-An Cheng -
2022 : Near-Optimal Deployment Efficiency in Reward-Free Reinforcement Learning with Linear Function Approximation »
Dan Qiao · Yu-Xiang Wang -
2022 : Differentially Private Gradient Boosting on Linear Learners for Tabular Data »
Saeyoung Rho · Shuai Tang · Sergul Aydore · Michael Kearns · Aaron Roth · Yu-Xiang Wang · Steven Wu · Cedric Archambeau -
2022 : Differentially Private Bias-Term only Fine-tuning of Foundation Models »
Zhiqi Bu · Yu-Xiang Wang · Sheng Zha · George Karypis -
2022 : Contributed Talk: Differentially Private Bias-Term only Fine-tuning of Foundation Models »
Zhiqi Bu · Yu-Xiang Wang · Sheng Zha · George Karypis -
2022 : Panel on Privacy and Security in Machine Learning Systems »
Graham Cormode · Borja Balle · Yu-Xiang Wang · Alejandro Saucedo · Neil Lawrence -
2022 : Practical differential privacy »
Yu-Xiang Wang · Fariba Yousefi -
2022 : Practical differential privacy »
Yu-Xiang Wang -
2022 Spotlight: Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret »
Jiawei Huang · Li Zhao · Tao Qin · Wei Chen · Nan Jiang · Tie-Yan Liu -
2022 Spotlight: Lightning Talks 4A-1 »
Jiawei Huang · Su Jia · Abdurakhmon Sadiev · Ruomin Huang · Yuanyu Wan · Denizalp Goktas · Jiechao Guan · Andrew Li · Wei-Wei Tu · Li Zhao · Amy Greenwald · Jiawei Huang · Dmitry Kovalev · Yong Liu · Wenjie Liu · Peter Richtarik · Lijun Zhang · Zhiwu Lu · R Ravi · Tao Qin · Wei Chen · Hu Ding · Nan Jiang · Tie-Yan Liu -
2022 Poster: Identifying good directions to escape the NTK regime and efficiently learn low-degree plus sparse polynomials »
Eshaan Nichani · Yu Bai · Jason Lee -
2022 Poster: Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions »
Audrey Huang · Nan Jiang -
2022 Poster: Efficient Phi-Regret Minimization in Extensive-Form Games via Online Mirror Descent »
Yu Bai · Chi Jin · Song Mei · Ziang Song · Tiancheng Yu -
2022 Poster: SeqPATE: Differentially Private Text Generation via Knowledge Distillation »
Zhiliang Tian · Yingxiu Zhao · Ziyue Huang · Yu-Xiang Wang · Nevin L. Zhang · He He -
2022 Poster: Differentially Private Linear Sketches: Efficient Implementations and Applications »
Fuheng Zhao · Dan Qiao · Rachel Redberg · Divyakant Agrawal · Amr El Abbadi · Yu-Xiang Wang -
2022 Poster: Interaction-Grounded Learning with Action-Inclusive Feedback »
Tengyang Xie · Akanksha Saran · Dylan J Foster · Lekan Molu · Ida Momennejad · Nan Jiang · Paul Mineiro · John Langford -
2022 Poster: A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation »
Philip Amortila · Nan Jiang · Dhruv Madeka · Dean Foster -
2022 Poster: On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL »
Jinglin Chen · Aditya Modi · Akshay Krishnamurthy · Nan Jiang · Alekh Agarwal -
2022 Poster: Policy Optimization for Markov Games: Unified Framework and Faster Convergence »
Runyu Zhang · Qinghua Liu · Huan Wang · Caiming Xiong · Na Li · Yu Bai -
2022 Poster: Optimal Dynamic Regret in LQR Control »
Dheeraj Baby · Yu-Xiang Wang -
2022 Poster: Sample-Efficient Learning of Correlated Equilibria in Extensive-Form Games »
Ziang Song · Song Mei · Yu Bai -
2021 : Retrospective Panel »
Sergey Levine · Nando de Freitas · Emma Brunskill · Finale Doshi-Velez · Nan Jiang · Rishabh Agarwal -
2021 Workshop: Offline Reinforcement Learning »
Rishabh Agarwal · Aviral Kumar · George Tucker · Justin Fu · Nan Jiang · Doina Precup · Aviral Kumar -
2021 Workshop: Privacy in Machine Learning (PriML) 2021 »
Yu-Xiang Wang · Borja Balle · Giovanni Cherubin · Kamalika Chaudhuri · Antti Honkela · Jonathan Lebensold · Casey Meehan · Mi Jung Park · Adrian Weller · Yuqing Zhu -
2021 Poster: Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning »
Siyuan Zhang · Nan Jiang -
2021 Poster: Bellman-consistent Pessimism for Offline Reinforcement Learning »
Tengyang Xie · Ching-An Cheng · Nan Jiang · Paul Mineiro · Alekh Agarwal -
2021 Poster: Privately Publishable Per-instance Privacy »
Rachel Redberg · Yu-Xiang Wang -
2021 Poster: Logarithmic Regret in Feature-based Dynamic Pricing »
Jianyu Xu · Yu-Xiang Wang -
2021 Poster: Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings »
Ming Yin · Yu-Xiang Wang -
2021 Poster: Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games »
Yu Bai · Chi Jin · Huan Wang · Caiming Xiong -
2021 Oral: Bellman-consistent Pessimism for Offline Reinforcement Learning »
Tengyang Xie · Ching-An Cheng · Nan Jiang · Paul Mineiro · Alekh Agarwal -
2021 Poster: Understanding the Under-Coverage Bias in Uncertainty Estimation »
Yu Bai · Song Mei · Huan Wang · Caiming Xiong -
2021 Poster: Towards Instance-Optimal Offline Reinforcement Learning with Pessimism »
Ming Yin · Yu-Xiang Wang -
2021 Poster: Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning »
Tengyang Xie · Nan Jiang · Huan Wang · Caiming Xiong · Yu Bai -
2021 Poster: Near-Optimal Offline Reinforcement Learning via Double Variance Reduction »
Ming Yin · Yu Bai · Yu-Xiang Wang -
2020 : Towards Reliable Validation and Evaluation for Offline RL »
Nan Jiang -
2020 : Panel »
Emma Brunskill · Nan Jiang · Nando de Freitas · Finale Doshi-Velez · Sergey Levine · John Langford · Lihong Li · George Tucker · Rishabh Agarwal · Aviral Kumar -
2020 Workshop: Privacy Preserving Machine Learning - PriML and PPML Joint Edition »
Borja Balle · James Bell · Aurélien Bellet · Kamalika Chaudhuri · Adria Gascon · Antti Honkela · Antti Koskela · Casey Meehan · Olga Ohrimenko · Mi Jung Park · Mariana Raykova · Mary Anne Smart · Yu-Xiang Wang · Adrian Weller -
2020 Poster: Domain Adaptation with Conditional Distribution Matching and Generalized Label Shift »
Remi Tachet des Combes · Han Zhao · Yu-Xiang Wang · Geoffrey Gordon -
2020 Poster: Adaptive Online Estimation of Piecewise Polynomial Trends »
Dheeraj Baby · Yu-Xiang Wang -
2020 Poster: Improving Sparse Vector Technique with Renyi Differential Privacy »
Yuqing Zhu · Yu-Xiang Wang -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 Poster: Online Forecasting of Total-Variation-bounded Sequences »
Dheeraj Baby · Yu-Xiang Wang -
2019 Poster: Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting »
Shiyang Li · Xiaoyong Jin · Yao Xuan · Xiyou Zhou · Wenhu Chen · Yu-Xiang Wang · Xifeng Yan -
2019 Poster: Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling »
Tengyang Xie · Yifei Ma · Yu-Xiang Wang -
2018 : Contributed talk 2: Subsampled Renyi Differential Privacy and Analytical Moments Accountant »
Yu-Xiang Wang -
2018 Poster: A Block Coordinate Ascent Algorithm for Mean-Variance Optimization »
Tengyang Xie · Bo Liu · Yangyang Xu · Mohammad Ghavamzadeh · Yinlam Chow · Daoming Lyu · Daesub Yoon -
2017 Poster: Higher-Order Total Variation Classes on Grids: Minimax Theory and Trend Filtering Methods »
Veeranjaneyulu Sadhanala · Yu-Xiang Wang · James Sharpnack · Ryan Tibshirani -
2016 : Optimal and Adaptive Off-policy Evaluation in Contextual Bandits »
Yu-Xiang Wang -
2016 Poster: Total Variation Classes Beyond 1d: Minimax Rates, and the Limitations of Linear Smoothers »
Veeranjaneyulu Sadhanala · Yu-Xiang Wang · Ryan Tibshirani -
2015 : Yu-Xiang Wang: Learning with differential privacy: stability, learnability and the sufficiency and necessity of ERM principle »
Yu-Xiang Wang -
2015 Poster: Differentially private subspace clustering »
Yining Wang · Yu-Xiang Wang · Aarti Singh -
2013 Poster: Provable Subspace Clustering: When LRR meets SSC »
Yu-Xiang Wang · Huan Xu · Chenlei Leng -
2013 Spotlight: Provable Subspace Clustering: When LRR meets SSC »
Yu-Xiang Wang · Huan Xu · Chenlei Leng