We propose a novel scheme for fitting heavily parameterized non-linear stochastic differential equations (SDEs). We assign a prior on the parameters of the SDE drift and diffusion functions to achieve a Bayesian model. We then infer this model using the well-known local reparameterized trick for the first time for empirical Bayes, i.e. to integrate out the SDE parameters. The model is then fit by maximizing the likelihood of the resultant marginal with respect to a potentially large number of hyperparameters, which prohibits stable training. As the prior parameters are marginalized, the model also no longer provides a principled means to incorporate prior knowledge. We overcome both of these drawbacks by deriving a training loss that comprises the marginal likelihood of the predictor and a PAC-Bayesian complexity penalty. We observe on synthetic as well as real-world time series prediction tasks that our method provides an improved model fit accompanied with favorable extrapolation properties when provided a partial description of the environment dynamics. Hence, we view the outcome as a promising attempt for building cutting-edge hybrid learning systems that effectively combine first-principle physics and data-driven approaches.