Benchmarking of Universal Machine Learning Interatomic Potentials for Structural Relaxation
Abstract
The development of increasingly robust machine learning models for computational material science is escalating interest in integrating these models into real-world simulation workflows. Despite reporting strong model performance, the evaluation benchmarks typically only report a single error metric for the model’s designated task. It is therefore difficult to predict how these models will perform in common workflows such as atomistic relaxations. Because of this, a more comprehensive set of testing benchmarks is needed to evaluate models performance on these dynamic tasks. A relaxation test is applied to three widely used models, namely: CHGNet, M3GNet, and MACE. The performance of these models showcase that although similar benchmark metrics are reported, models can exhibit significantly varied behavior in the relaxation test, even when trained on similar or identical datasets.