A Multi-Target Dataset for AI-Driven Electronic Structure and Materials Discovery
Cesare Malosso · Joseph Abbott · Philip Loche · Arslan Mazitov · Paolo Pegolo · Davide Tisi · Michele Ceriotti
Abstract
Recent advances in foundation models for atomistic machine learning have enabled accurate and efficient predictions of energies and forces across diverse systems, bringing first-principles accuracy within reach at a fraction of the computational cost. However, most existing models neglect critical electronic-structure properties. To address this gap, we aim to develop a multi-target dataset that expands in two key directions: (i) inclusion of a broad range of electronic-structure targets—including polarization, polarizability, density of states, electron densities, and Hamiltonians—and (ii) coverage of a wide portion of the periodic table, including noble gases, transition metal, lanthanides, and actinides, as well as broad classes of materials such as molecules, solids, liquids, electrolytes, surface/adsorbate systems, metal–organic frameworks, and 2D materials. To achieve this, we will generate a high-quality, internally consistent, and diverse dataset using the all-electron FHI-aims code. Thanks to the affordable size of the proposed dataset (~500,000 structures), it is feasible to compute all properties with the r$^{2}$SCAN meta-GGA functional, which is particularly well suited for sensitive electronic-structure properties.
Chat is not available.
Successful Page Load