When learning control policies from trial and error directly on hardware systems, ensuring safety is crucial to avoid costly damage to the system. Existing model-free reinforcement learning methods that guarantee safety during exploration are limited to optima within the safe region connected to a safe initialization, which may be worse than the safe globally optimal solution. In this work, we present GoSafe, an algorithm that can search for globally optimal policies while guaranteeing safety and demonstrate its applicability in experiments on a real robot arm.