Timezone: »
Potential based reward shaping is a powerful technique for accelerating convergence of reinforcement learning algorithms. Typically, such information includes an estimate of the optimal value function and is often provided by a human expert or other sources of domain knowledge. However, this information is often biased or inaccurate and can mislead many reinforcement learning algorithms. In this paper, we apply Bayesian Model Combination with multiple experts in a way that learns to trust a good combination of experts as training progresses. This approach is both computationally efficient and general, and is shown numerically to improve convergence across discrete and continuous domains and different reinforcement learning algorithms.
Author Information
Michael Gimelfarb (University of Toronto)
Scott Sanner (University of Toronto)
Chi-Guhn Lee (University of Toronto)
More from the Same Authors
-
2021 Poster: Risk-Aware Transfer in Reinforcement Learning using Successor Features »
Michael Gimelfarb · Andre Barreto · Scott Sanner · Chi-Guhn Lee -
2017 Poster: Scalable Planning with Tensorflow for Hybrid Nonlinear Domains »
Ga Wu · Buser Say · Scott Sanner