Timezone: »
Poster
Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning
Jingfeng Wu · Vladimir Braverman · Lin Yang
In this paper we consider multi-objective reinforcement learning where the objectives are balanced using preferences. In practice, the preferences are often given in an adversarial manner, e.g., customers can be picky in many applications. We formalize this problem as an episodic learning problem on a Markov decision process, where transitions are unknown and a reward function is the inner product of a preference vector with pre-specified multi-objective reward functions. We consider two settings. In the online setting, the agent receives a (adversarial) preference every episode and proposes policies to interact with the environment. We provide a model-based algorithm that achieves a nearly minimax optimal regret bound $\widetilde{\mathcal{O}}\bigl(\sqrt{\min\{d,S\}\cdot H^2 SAK}\bigr)$, where $d$ is the number of objectives, $S$ is the number of states, $A$ is the number of actions, $H$ is the length of the horizon, and $K$ is the number of episodes. Furthermore, we consider preference-free exploration, i.e., the agent first interacts with the environment without specifying any preference and then is able to accommodate arbitrary preference vector up to $\epsilon$ error. Our proposed algorithm is provably efficient with a nearly optimal trajectory complexity $\widetilde{\mathcal{O}}\bigl({\min\{d,S\}\cdot H^3 SA}/{\epsilon^2}\bigr)$. This result partly resolves an open problem raised by \citet{jin2020reward}.
Author Information
Jingfeng Wu (Johns Hopkins University)
Vladimir Braverman (Johns Hopkins University)
Lin Yang (UCLA)
More from the Same Authors
-
2021 Spotlight: Coresets for Clustering with Missing Values »
Vladimir Braverman · Shaofeng Jiang · Robert Krauthgamer · Xuan Wu -
2021 : Doubly Pessimistic Algorithms for Strictly Safe Off-Policy Optimization »
Sanae Amani · Lin Yang -
2022 : Bidirectional Adaptive Communication for Heterogeneous Distributed Learning »
Dmitrii Avdiukhin · Vladimir Braverman · Nikita Ivkin · Sebastian Stich -
2022 : From Local to Global: Spectral-Inspired Graph Neural Networks »
Ningyuan Huang · Soledad Villar · Carey E Priebe · Da Zheng · Chengyue Huang · Lin Yang · Vladimir Braverman -
2022 Spotlight: Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime »
Difan Zou · Jingfeng Wu · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2022 Poster: Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning »
Dingwen Kong · Lin Yang -
2022 Poster: The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift »
Jingfeng Wu · Difan Zou · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2022 Poster: Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime »
Difan Zou · Jingfeng Wu · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2022 Poster: Learning from Distributed Users in Contextual Linear Bandits Without Sharing the Context »
Osama Hanna · Lin Yang · Christina Fragouli -
2022 Poster: Near-Optimal Sample Complexity Bounds for Constrained MDPs »
Sharan Vaswani · Lin Yang · Csaba Szepesvari -
2021 Poster: Coresets for Clustering with Missing Values »
Vladimir Braverman · Shaofeng Jiang · Robert Krauthgamer · Xuan Wu -
2021 Poster: The Benefits of Implicit Regularization from SGD in Least Squares Problems »
Difan Zou · Jingfeng Wu · Vladimir Braverman · Quanquan Gu · Dean Foster · Sham Kakade -
2021 Poster: Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs »
Han Zhong · Jiayi Huang · Lin Yang · Liwei Wang -
2021 Poster: On the Value of Interaction and Function Approximation in Imitation Learning »
Nived Rajaraman · Yanjun Han · Lin Yang · Jingbo Liu · Jiantao Jiao · Kannan Ramchandran -
2021 Poster: Adversarial Robustness of Streaming Algorithms through Importance Sampling »
Vladimir Braverman · Avinatan Hassidim · Yossi Matias · Mariano Schain · Sandeep Silwal · Samson Zhou -
2020 Poster: Planning with General Objective Functions: Going Beyond Total Rewards »
Ruosong Wang · Peilin Zhong · Simon Du · Russ Salakhutdinov · Lin Yang -
2020 Poster: Is Long Horizon RL More Difficult Than Short Horizon RL? »
Ruosong Wang · Simon Du · Lin Yang · Sham Kakade -
2020 Poster: Toward the Fundamental Limits of Imitation Learning »
Nived Rajaraman · Lin Yang · Jiantao Jiao · Kannan Ramchandran -
2020 Poster: Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning? »
Qiwen Cui · Lin Yang -
2020 Poster: Preference-based Reinforcement Learning with Finite-Time Guarantees »
Yichong Xu · Ruosong Wang · Lin Yang · Aarti Singh · Artur Dubrawski -
2020 Spotlight: Preference-based Reinforcement Learning with Finite-Time Guarantees »
Yichong Xu · Ruosong Wang · Lin Yang · Aarti Singh · Artur Dubrawski -
2020 Poster: On Reward-Free Reinforcement Learning with Linear Function Approximation »
Ruosong Wang · Simon Du · Lin Yang · Russ Salakhutdinov -
2020 Poster: Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning »
Fei Feng · Ruosong Wang · Wotao Yin · Simon Du · Lin Yang -
2020 Poster: Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension »
Ruosong Wang · Russ Salakhutdinov · Lin Yang -
2020 Poster: Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity »
Kaiqing Zhang · Sham Kakade · Tamer Basar · Lin Yang -
2020 Spotlight: Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity »
Kaiqing Zhang · Sham Kakade · Tamer Basar · Lin Yang -
2020 Spotlight: Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning »
Fei Feng · Ruosong Wang · Wotao Yin · Simon Du · Lin Yang -
2019 : Poster Spotlight 2 »
Aaron Sidford · Mengdi Wang · Lin Yang · Yinyu Ye · Zuyue Fu · Zhuoran Yang · Yongxin Chen · Zhaoran Wang · Ofir Nachum · Bo Dai · Ilya Kostrikov · Dale Schuurmans · Ziyang Tang · Yihao Feng · Lihong Li · Denny Zhou · Qiang Liu · Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Simon Du · Sham Kakade · Ruosong Wang · Minshuo Chen · Tianyi Liu · Xingguo Li · Zhaoran Wang · Tuo Zhao · Philip Amortila · Doina Precup · Prakash Panangaden · Marc Bellemare -
2019 Poster: Efficient Symmetric Norm Regression via Linear Sketching »
Zhao Song · Ruosong Wang · Lin Yang · Hongyang Zhang · Peilin Zhong -
2019 Poster: Communication-efficient Distributed SGD with Sketching »
Nikita Ivkin · Daniel Rothchild · Enayat Ullah · Vladimir Braverman · Ion Stoica · Raman Arora -
2018 Poster: The Physical Systems Behind Optimization Algorithms »
Lin Yang · Raman Arora · Vladimir Braverman · Tuo Zhao -
2018 Poster: Differentially Private Robust Low-Rank Approximation »
Raman Arora · Vladimir Braverman · Jalaj Upadhyay -
2017 : Poster Session »
Tsz Kit Lau · Johannes Maly · Nicolas Loizou · Christian Kroer · Yuan Yao · Youngsuk Park · Reka Agnes Kovacs · Dong Yin · Vlad Zhukov · Woosang Lim · David Barmherzig · Dimitris Metaxas · Bin Shi · Rajan Udwani · William Brendel · Yi Zhou · Vladimir Braverman · Sijia Liu · Eugene Golikov