Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards
Alexander Trott · Stephan Zheng · Caiming Xiong · Richard Socher

Wed Dec 11th 05:00 -- 07:00 PM @ East Exhibition Hall B + C #205

While using shaped rewards can be beneficial when solving sparse reward tasks, their successful application often requires careful engineering and is problem specific. For instance, in tasks where the agent must achieve some goal state, simple distance-to-goal reward shaping often fails, as it renders learning vulnerable to local optima. We introduce a simple and effective model-free method to learn from shaped distance-to-goal rewards on tasks where success depends on reaching a goal state. Our method introduces an auxiliary distance-based reward based on pairs of rollouts to encourage diverse exploration. This approach effectively prevents learning dynamics from stabilizing around local optima induced by the naive distance-to-goal reward shaping and enables policies to efficiently solve sparse reward tasks. Our augmented objective does not require any additional reward engineering or domain expertise to implement and converges to the original sparse objective as the agent learns to solve the task. We demonstrate that our method successfully solves a variety of hard-exploration tasks (including maze navigation and 3D construction in a Minecraft environment), where naive distance-based reward shaping otherwise fails, and intrinsic curiosity and reward relabeling strategies exhibit poor performance.

Author Information

Alex Trott (Salesforce Research)
Stephan Zheng (Salesforce)
Caiming Xiong (Salesforce)
Richard Socher (Salesforce)

Richard Socher is Chief Scientist at Salesforce. He leads the company’s research efforts and brings state of the art artificial intelligence solutions into the platform. Prior, Richard was an adjunct professor at the Stanford Computer Science Department and the CEO and founder of MetaMind, a startup acquired by Salesforce in April 2016. MetaMind’s deep learning AI platform analyzes, labels and makes predictions on image and text data so businesses can make smarter, faster and more accurate decisions.

More from the Same Authors