Timezone: »

AutoRL-Bench 1.0
Gresa Shala · Sebastian Pineda Arango · André Biedenkapp · Frank Hutter · Josif Grabocka
Event URL: https://openreview.net/forum?id=RyAl60VhTcG »

It is well established that Reinforcement Learning (RL) is very brittle and sensitive to the choice of hyperparameters. This prevents RL methods from being usable out of the box.The field of automated RL (AutoRL) aims at automatically configuring the RL pipeline, to both make RL usable by a broader audience, as well as reveal its full potential. Still, there has been little progress towards this goal as new AutoRL methods often are evaluated with incompatible experimental protocols.Furthermore, the typically high cost of experimentation prevents a thorough and meaningful comparison of different AutoRL methods or established hyperparameter optimization (HPO) methods from the automated Machine Learning (AutoML) community.To alleviate these issues, we propose the first tabular AutoRL Benchmark for studying the hyperparameters of RL algorithms. We consider the hyperparameter search spaces of five well established RL methods (PPO, DDPG, A2C, SAC, TD3) across 22 environments for which we compute and provide the reward curves. This enables HPO methods to simply query our benchmark as a lookup table, instead of actually training agents. Thus, our benchmark offers a testbed for very fast, fair, and reproducible experimental protocols for comparing future black-box, gray-box, and online HPO methods for RL.

Author Information

Gresa Shala (Universität Freiburg)
Sebastian Pineda Arango (Albert-Ludwigs-Universität Freiburg)
André Biedenkapp (University of Freiburg)
Frank Hutter (University of Freiburg & Bosch)

Frank Hutter is a Full Professor for Machine Learning at the Computer Science Department of the University of Freiburg (Germany), where he previously was an assistant professor 2013-2017. Before that, he was at the University of British Columbia (UBC) for eight years, for his PhD and postdoc. Frank's main research interests lie in machine learning, artificial intelligence and automated algorithm design. For his 2009 PhD thesis on algorithm configuration, he received the CAIAC doctoral dissertation award for the best thesis in AI in Canada that year, and with his coauthors, he received several best paper awards and prizes in international competitions on machine learning, SAT solving, and AI planning. Since 2016 he holds an ERC Starting Grant for a project on automating deep learning based on Bayesian optimization, Bayesian neural networks, and deep reinforcement learning.

Josif Grabocka (Universität Freiburg)

More from the Same Authors