AI4O3: A Foundational Data Collection for Artificial Intelligence in Tropospheric Ozone Research
Makoto Kelp · Sebastian H. M. Hickman · Kazuyuki Miyazaki · Kai-Lan Chang · Paul Griffiths · Qindan Zhu · Gerbrand Koren · Fernando Iglesias-Suarez · Elyse Pennington · Martin Schultz
Abstract
Ozone ( $O_3$) is an internationally regulated air pollutant that damages human health, vegetation, and the climate, yet remains one of the most challenging atmospheric constituents to predict. Tropospheric ozone is driven by complex, nonlinear interactions between precursor emissions, meteorology, and chemistry across scales. Despite its importance, AI development for ozone prediction has lagged behind weather forecasting due to fragmented, heterogeneous data and the lack of standardized benchmarks. Because ozone responds to nearly all physical and chemical processes in the atmosphere, it is a powerful diagnostic for evaluating AI model skill and failure modes. We present AI4O3, the first large-scale, harmonized dataset for AI-based tropospheric ozone forecasting, integrating surface, satellite, balloon, aircraft, and reanalysis data. Spanning multiple decades and vertical levels, AI4O3 will provide co-located hourly data on ozone and key covariates in an AI-ready format. Our benchmark is designed to test model skill on both typical and extreme events, such as wildfires and heatwaves, where ozone often spikes. We will first release AI4O3v1 for North America and Europe, including metadata, quality flags, uncertainty estimates, and supporting variables relevant to ozone formation. The dataset will be cloud-hosted with automated pipelines for preprocessing, benchmarking, and future global expansion.
Video
Chat is not available.
Successful Page Load