Bayesian optimization has been shown to be a powerful tool for solving black box problems during online accelerator optimization. The major advantage of Bayesian based optimization techniques is the ability to include prior information about the problem to speed up optimization, even if that information is not perfectly correlated with experimental measurements. In parallel, neural network surrogate system models of accelerator facilities are increasingly being made available, but at present they are not widely used in online optimization. In this work, we demonstrate the use of an approximate neural network surrogate model as a prior mean for Gaussian processes used in Bayesian optimization in a realistic setting. We show that the initial performance of Bayesian optimization is improved by using neural network surrogate models, even when surrogate models make erroneous predictions. Finally, we quantify requirements on surrogate prediction accuracy to achieve optimization performance when solving problems in high dimensional input spaces.