Ensembles are widely used in machine learning and, usually, provide state-of-the-art performance in many prediction tasks. From the very beginning, diversity of ensemble members has been identified as a key factor for the superior performance of an ensemble. But the exact role that diversity plays in an ensemble model is not fully understood and is still an open question. In this work, we employ a second order PAC-Bayesian analysis to shed light on this problem in the context of neural network ensembles. More precisely, we provide sound theoretical answers to the following questions: how to measure diversity, how diversity relates to the generalization error and how diversity can be promoted by ensemble learning algorithms. This analysis covers three widely used loss functions, namely, the squared loss, the cross-entropy loss, and the 0-1 loss; and two widely used model combination strategies, namely, model averaging and weighted majority vote. We empirically validate this theoretical analysis on ensembles of neural networks.