Transfer of recent advances in deep reinforcement learning to real-worldapplications is hindered by high data demands and thus low efficiency andscalability. Through independent improvements of components such as replaybuffers or more stable learning algorithms, and through massively distributedsystems, training time could be reduced from several days to several hours forstandard benchmark tasks. However, while rewards in simulated environments arewell-defined and easy to compute, reward evaluation becomes the bottleneck inmany real-world environments, e.g. in molecular optimization tasks, wherecomputationally demanding simulations or even experiments are required toevaluate states and to quantify rewards. Therefore, training might becomeprohibitively expensive without an extensive amount of computational resourcesand time. We propose to alleviate this problem by replacing costly ground-truthrewards with rewards modeled by neural networks, counteracting non-stationarityof state and reward distributions during training with an active learning component.We demonstrate that using our proposed ACRL method (actively learning costlyrewards for reinforcement learning), it is possible to train agents in complexreal-world environments orders of magnitudes faster. By enabling the applicationof reinforcement learning methods to new domains, we show that we can findinteresting and non-trivial solutions to real-world optimization problems inchemistry, materials science and engineering.