Recently, neural network potentials (NNPs) have been shown to be particularly effective in conducting atomistic simulations for computational material discovery. Especially in recent years, large-scale datasets have begun to emerge for the purpose of ensuring versatility. However, we show that even with a large dataset and a model that achieves good validation accuracy, the resulting energy surface can be quite delicate and the easily reach unrealistic extrapolation regions during the simulation. We first demonstrate this behavior using a DimeNet++ trained on Open Catalyst 2020 dataset (OC20). Based on this observation, we propose a hypothesis that for NNP models to attain the versatality, the training dataset should contain a diverse set of virtual structures. To verify this, we have created a relatively much smaller benchmark dataset called "High-temperature Multi-Element 2021" (HME21) dataset, which was sampled through a high-temperature molecular dynamics simulation and has less prior information. We conduct benchmark experiments on HME21 and show that training a TeaNet on HME21 can achieve better performance in reproducing the absorption process, although HME21 does not contain corresponding atomic structures. Our findings indicates that dataset diversity can be more essential than the dataset quantity in training universal NNPs for material discovery.