Skip to yearly menu bar Skip to main content


Poster

Sim-to-Real Transfer Can Make Naive Exploration Efficient in Reinforcement Learning

Andrew Wagenmaker · Kevin Huang · Liyiming Ke · Kevin Jamieson · Abhishek Gupta

West Ballroom A-D #6507
[ ]
Thu 12 Dec 11 a.m. PST — 2 p.m. PST

Abstract: In order to mitigate the sample complexity of real-world reinforcement learning, common practice is to first train a policy in a simulator where samples are cheap, and then deploy this policy in the real world, with the hope that it generalizes effectively. Such direct \emph{sim2real} transfer is not guaranteed to succeed, however, and in cases where it fails, it is unclear how to best utilize the simulator. In this work, we show that in many regimes, while direct sim2real transfer may fail, we can utilize the simulator to learn a set of \emph{exploratory} policies which enable efficient exploration in the real world. In particular, in the setting of low-rank MDPs, we show that these exploratory policies enable naive exploration methods---precisely, randomized exploration approaches such as $\epsilon$-greedy coupled with a regression oracle---to obtain a polynomial sample complexity, yielding an exponential improvement over direct sim2real transfer, or learning without access to a simulator. We validate our theoretical results on a realistic robotic simulator and real-world robotic sim2real task, demonstrating that transferring exploratory policies can yield substantial gains in practice as well.

Live content is unavailable. Log in and register to view live content