Timezone: »
Poster
Preference-based Reinforcement Learning with Finite-Time Guarantees
Yichong Xu · Ruosong Wang · Lin Yang · Aarti Singh · Artur Dubrawski
Preference-based Reinforcement Learning (PbRL) replaces reward values in traditional reinforcement learning by preferences to better elicit human opinion on the target objective, especially when numerical reward values are hard to design or interpret.
Despite promising results in applications, the theoretical understanding of PbRL is still in its infancy.
In this paper, we present the first finite-time analysis for general PbRL problems.
We first show that a unique optimal policy may not exist if preferences over trajectories are deterministic for PbRL.
If preferences are stochastic, and the preference probability relates to the hidden reward values, we present algorithms for PbRL, both with and without a simulator, that are able to identify the best policy up to accuracy $\varepsilon$ with high probability. Our method explores the state space by navigating to under-explored states, and solves PbRL using a combination of dueling bandits and policy search.
Experiments show the efficacy of our method when it is applied to real-world problems.
Author Information
Yichong Xu (Microsoft)
Ruosong Wang (Carnegie Mellon University)
Lin Yang (UCLA)
Aarti Singh (CMU)
Artur Dubrawski (Carnegie Mellon University)
Related Events (a corresponding poster, oral, or spotlight)
-
2020 Spotlight: Preference-based Reinforcement Learning with Finite-Time Guarantees »
Wed Dec 9th 04:20 -- 04:30 AM Room Orals & Spotlights: Reinforcement Learning
More from the Same Authors
-
2020 Session: Orals & Spotlights Track 33: Health/AutoML/(Soft|Hard)ware »
Dustin Tran · Artur Dubrawski -
2020 Poster: Planning with General Objective Functions: Going Beyond Total Rewards »
Ruosong Wang · Peilin Zhong · Simon Du · Russ Salakhutdinov · Lin Yang -
2020 Poster: Is Long Horizon RL More Difficult Than Short Horizon RL? »
Ruosong Wang · Simon Du · Lin Yang · Sham Kakade -
2020 Poster: Toward the Fundamental Limits of Imitation Learning »
Nived Rajaraman · Lin Yang · Jiantao Jiao · Kannan Ramchandran -
2020 Poster: Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning? »
Qiwen Cui · Lin Yang -
2020 Poster: Agnostic $Q$-learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity »
Simon Du · Jason Lee · Gaurav Mahajan · Ruosong Wang -
2020 Poster: On Reward-Free Reinforcement Learning with Linear Function Approximation »
Ruosong Wang · Simon Du · Lin Yang · Russ Salakhutdinov -
2020 Poster: Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning »
Fei Feng · Ruosong Wang · Wotao Yin · Simon Du · Lin Yang -
2020 Poster: Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension »
Ruosong Wang · Russ Salakhutdinov · Lin Yang -
2020 Poster: Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity »
Kaiqing Zhang · Sham Kakade · Tamer Basar · Lin Yang -
2020 Spotlight: Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity »
Kaiqing Zhang · Sham Kakade · Tamer Basar · Lin Yang -
2020 Spotlight: Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning »
Fei Feng · Ruosong Wang · Wotao Yin · Simon Du · Lin Yang -
2019 Poster: On Testing for Biases in Peer Review »
Ivan Stelmakh · Nihar Shah · Aarti Singh -
2019 Spotlight: On Testing for Biases in Peer Review »
Ivan Stelmakh · Nihar Shah · Aarti Singh -
2019 Poster: Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels »
Simon Du · Kangcheng Hou · Russ Salakhutdinov · Barnabas Poczos · Ruosong Wang · Keyulu Xu -
2019 Poster: Efficient Symmetric Norm Regression via Linear Sketching »
Zhao Song · Ruosong Wang · Lin Yang · Hongyang Zhang · Peilin Zhong -
2019 Poster: Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle »
Simon Du · Yuping Luo · Ruosong Wang · Hanrui Zhang -
2019 Poster: Mutually Regressive Point Processes »
Ifigeneia Apostolopoulou · Scott Linderman · Kyle Miller · Artur Dubrawski -
2019 Poster: On Exact Computation with an Infinitely Wide Neural Net »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Russ Salakhutdinov · Ruosong Wang -
2019 Spotlight: On Exact Computation with an Infinitely Wide Neural Net »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Russ Salakhutdinov · Ruosong Wang -
2018 Poster: How Many Samples are Needed to Estimate a Convolutional Neural Network? »
Simon Du · Yining Wang · Xiyu Zhai · Sivaraman Balakrishnan · Russ Salakhutdinov · Aarti Singh -
2018 Poster: Optimization of Smooth Functions with Noisy Observations: Local Minimax Rates »
Yining Wang · Sivaraman Balakrishnan · Aarti Singh -
2017 Poster: Hypothesis Transfer Learning via Transformation Functions »
Simon Du · Jayanth Koushik · Aarti Singh · Barnabas Poczos -
2017 Poster: Gradient Descent Can Take Exponential Time to Escape Saddle Points »
Simon Du · Chi Jin · Jason D Lee · Michael Jordan · Aarti Singh · Barnabas Poczos -
2017 Spotlight: Gradient Descent Can Take Exponential Time to Escape Saddle Points »
Simon Du · Chi Jin · Jason D Lee · Michael Jordan · Aarti Singh · Barnabas Poczos -
2017 Poster: On the Power of Truncated SVD for General High-rank Matrix Estimation Problems »
Simon Du · Yining Wang · Aarti Singh -
2017 Poster: Noise-Tolerant Interactive Learning Using Pairwise Comparisons »
Yichong Xu · Hongyang Zhang · Aarti Singh · Artur Dubrawski · Kyle Miller -
2016 Poster: Data Poisoning Attacks on Factorization-Based Collaborative Filtering »
Bo Li · Yining Wang · Aarti Singh · Yevgeniy Vorobeychik -
2015 Poster: Differentially private subspace clustering »
Yining Wang · Yu-Xiang Wang · Aarti Singh -
2015 Demonstration: An interactive system for the extraction of meaningful visualizations from high-dimensional data »
Madalina Fiterau · Artur Dubrawski · Donghan Wang -
2013 Poster: Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic »
James L Sharpnack · Akshay Krishnamurthy · Aarti Singh -
2013 Poster: Low-Rank Matrix and Tensor Completion via Adaptive Sampling »
Akshay Krishnamurthy · Aarti Singh -
2013 Poster: Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation »
Martin Azizyan · Aarti Singh · Larry Wasserman -
2013 Poster: Cluster Trees on Manifolds »
Sivaraman Balakrishnan · Srivatsan Narayanan · Alessandro Rinaldo · Aarti Singh · Larry Wasserman -
2012 Workshop: Algebraic Topology and Machine Learning »
Sivaraman Balakrishnan · Alessandro Rinaldo · Donald Sheehy · Aarti Singh · Larry Wasserman -
2012 Poster: Projection Retrieval for Classification »
Madalina Fiterau · Artur Dubrawski -
2011 Poster: Minimax Localization of Structural Information in Large Noisy Matrices »
Mladen Kolar · Sivaraman Balakrishnan · Alessandro Rinaldo · Aarti Singh -
2011 Poster: Noise Thresholds for Spectral Clustering »
Sivaraman Balakrishnan · Min Xu · Akshay Krishnamurthy · Aarti Singh -
2011 Spotlight: Noise Thresholds for Spectral Clustering »
Sivaraman Balakrishnan · Min Xu · Akshay Krishnamurthy · Aarti Singh -
2011 Spotlight: Minimax Localization of Structural Information in Large Noisy Matrices »
Mladen Kolar · Sivaraman Balakrishnan · Alessandro Rinaldo · Aarti Singh -
2010 Oral: Identifying graph-structured activation patterns in networks »
James L Sharpnack · Aarti Singh -
2010 Poster: Identifying graph-structured activation patterns in networks »
James L Sharpnack · Aarti Singh -
2008 Poster: Unlabeled data: Now it helps, now it doesn't »
Aarti Singh · Rob Nowak · Jerry Zhu -
2008 Poster: Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text »
Yi Zhang · Jeff Schneider · Artur Dubrawski -
2008 Oral: Unlabeled data: Now it helps, now it doesn't »
Aarti Singh · Rob Nowak · Jerry Zhu