Timezone: »

Gray-Box Gaussian Processes for Automated Reinforcement Learning
Gresa Shala · André Biedenkapp · Frank Hutter · Josif Grabocka
Event URL: https://openreview.net/forum?id=oJp7uTL7ox- »

Despite having achieved spectacular milestones in an array of important real-world applications, most Reinforcement Learning (RL) methods are very brittle concerning their hyperparameters. Notwithstanding the crucial importance of setting the hyperparameters in training state-of-the-art agents, the task of hyperparameter optimization (HPO) in RL is understudied. In this paper, we propose a novel gray-box Bayesian Optimization technique for HPO in RL, that enriches Gaussian Processes with reward curve estimations based on generalized logistic functions. We thus about the performance of learning algorithms, transferring information across configurations and about epochs of the learning algorithm. In a very large-scale experimental protocol, comprising 5 popular RL methods (DDPG, A2C, PPO, SAC, TD3), 22 environments (OpenAI Gym: Mujoco, Atari, Classic Control), and 7 HPO baselines, we demonstrate that our method significantly outperforms current HPO practices in RL.

Author Information

Gresa Shala (Universität Freiburg)
André Biedenkapp (University of Freiburg)
Frank Hutter (University of Freiburg & Bosch)

Frank Hutter is a Full Professor for Machine Learning at the Computer Science Department of the University of Freiburg (Germany), where he previously was an assistant professor 2013-2017. Before that, he was at the University of British Columbia (UBC) for eight years, for his PhD and postdoc. Frank's main research interests lie in machine learning, artificial intelligence and automated algorithm design. For his 2009 PhD thesis on algorithm configuration, he received the CAIAC doctoral dissertation award for the best thesis in AI in Canada that year, and with his coauthors, he received several best paper awards and prizes in international competitions on machine learning, SAT solving, and AI planning. Since 2016 he holds an ERC Starting Grant for a project on automating deep learning based on Bayesian optimization, Bayesian neural networks, and deep reinforcement learning.

Josif Grabocka (Universität Freiburg)

More from the Same Authors