Quantum materials “ImageNet”: an open experimental dataset for AI-driven discovery
Abstract
Discovering quantum materials remains a combinatorial, high-cost search problem constrained by small, fragmented experimental datasets and simulations that miss real-world synthesis effects. We propose a standardized, multi-modal experimental corpus linking synthesis–structure–property for two complementary classes – two-dimensional crystals and quantum dots – to enable inverse design of materials, realistic property prediction, and hierarchical experiment planning. The dataset will couple synthesis recipes, measurement context, experimental data on structural, chemical, electronic, optical, and magnetic properties, with quantified uncertainties and clear quality-control flags. Collected via four streams – purpose-built high-throughput campaigns (main source of dataset), literature mining with expert curation, integration with existing databases, and industry/academic partnerships – the resource targets ~1,000,000 sample-records in 3–5 years, enabled by platform automation that reduces cost per data point by ~100× and boosts throughput 1000×. We will release a common schema, APIs, and benchmark splits spanning inverse recipe generation, uncertainty-aware active learning, and cross-modal representation learning, with strong baselines to catalyze progress. By grounding models in measured behavior, this dataset turns trial-and-error into targeted predict–make–measure loops, accelerating advances in quantum computing, ultra-efficient AI hardware, sensing and secure communications, and clean-energy materials.