XASSpectrum–Structure Bench: Towards a Protein-Data-Bank–Level Resource for Experimental X-ray Absorption Spectroscopy
Abstract
X-ray absorption spectroscopy (XAS) provides element-specific insights into local atomic structure, but the inverse problem of inferring structural descriptors from spectra remains challenging for machine learning approaches due to the lack of validated spectrum–structure datasets. Existing XAS repositories focus on data storage rather than AI applications, lacking ground-truth structural labels with validation evidence. We introduce the XAS Spectrum–Structure Bench, the first curated dataset designed specifically for machine learning in XAS analysis. Our framework establishes rigorous validation standards for both spectral quality (using fingerprint-based metrics beyond energy calibration) and structural assignments (requiring auxiliary characterization evidence such as XRD, EXAFS fitting, or forward simulations). The initial release targets ~4,000 validated spectrum–structure pairs from major synchrotron facilities, processed through expert-guided interpretation and computational validation. Long-term sustainability leverages community contribution and publication-linked deposition models inspired by the Protein Data Bank. This resource will enable benchmark development for spectrum classification, inverse structure prediction, and transfer learning applications in operando studies, establishing reproducible standards for XAS-based machine learning research.