Timezone: »

ClimSim: A large multi-scale dataset for hybrid physics-ML climate emulation
Sungduk Yu · Walter Hannah · Liran Peng · Jerry Lin · Mohamed Aziz Bhouri · Ritwik Gupta · Björn Lütjens · Justus C. Will · Gunnar Behrens · Julius Busecke · Nora Loose · Charles Stern · Tom Beucler · Bryce Harrop · Benjamin Hillman · Andrea Jenney · Savannah L. Ferretti · Nana Liu · Animashree Anandkumar · Noah Brenowitz · Veronika Eyring · Nicholas Geneva · Pierre Gentine · Stephan Mandt · Jaideep Pathak · Akshay Subramaniam · Carl Vondrick · Rose Yu · Laure Zanna · Tian Zheng · Ryan Abernathey · Fiaz Ahmed · David Bader · Pierre Baldi · Elizabeth Barnes · Christopher Bretherton · Peter Caldwell · Wayne Chuang · Yilun Han · YU HUANG · Fernando Iglesias-Suarez · Sanket Jantre · Karthik Kashinath · Marat Khairoutdinov · Thorsten Kurth · Nicholas Lutsko · Po-Lun Ma · Griffin Mooers · J. David Neelin · David Randall · Sara Shamekh · Mark Taylor · Nathan Urban · Janni Yuval · Guang Zhang · Mike Pritchard

Wed Dec 13 01:45 PM -- 02:00 PM (PST) @
Event URL: http://leap-stc.github.io/ClimSim/ »

Modern climate projections lack adequate spatial and temporal resolution due to computational constraints. A consequence is inaccurate and imprecise predictions of critical processes such as storms. Hybrid methods that combine physics with machine learning (ML) have introduced a new generation of higher fidelity climate simulators that can sidestep Moore's Law by outsourcing compute-hungry, short, high-resolution simulations to ML emulators. However, this hybrid ML-physics simulation approach requires domain-specific treatment and has been inaccessible to ML experts because of lack of training data and relevant, easy-to-use workflows. We present ClimSim, the largest-ever dataset designed for hybrid ML-physics research. It comprises multi-scale climate simulations, developed by a consortium of climate scientists and ML researchers. It consists of 5.7 billion pairs of multivariate input and output vectors that isolate the influence of locally-nested, high-resolution, high-fidelity physics on a host climate simulator's macro-scale physical state.The dataset is global in coverage, spans multiple years at high sampling frequency, and is designed such that resulting emulators are compatible with downstream coupling into operational climate simulators. We implement a range of deterministic and stochastic regression baselines to highlight the ML challenges and their scoring. The data (https://huggingface.co/datasets/LEAP/ClimSim_high-res) and code (https://leap-stc.github.io/ClimSim) are released openly to support the development of hybrid ML-physics and high-fidelity climate simulations for the benefit of science and society.

Author Information

Sungduk Yu (UC Irvine)
Walter Hannah (Lawrence Livermore National Labs)
Liran Peng
Jerry Lin (UC Irvine)
Mohamed Aziz Bhouri (Columbia University)
Ritwik Gupta (University of California, Berkeley)
Björn Lütjens (Massachusetts Institute of Technology)
Justus C. Will (University of California, Irvine)
Justus C. Will

Research in the intersection of Computer Science and Mathematics, currently focused on Deep Generative Models, Neural Data Compression, and the application of Machine Learning to Climate Science.

Gunnar Behrens (Institute of Atmospheric Physics, German Aerospace Center (DLR))
Julius Busecke (Columbia University / LDEO)
Nora Loose (Princeton University)
Charles Stern (Columbia University)
Tom Beucler (University of Lausanne)
Bryce Harrop
Benjamin Hillman
Andrea Jenney (Oregon State University)
Savannah L. Ferretti (University of California, Irvine)
Nana Liu (University of California, Irvine)
Animashree Anandkumar (NVIDIA / Caltech)
Noah Brenowitz (NVIDIA)
Veronika Eyring (Deutsches Zentrum für Luft- und Raumfahrt e.V. (DLR), Institut für Physik der Atmosphäre, Oberpfaffenhofen, Germany; University of Bremen, Institute of Environmental Physics (IUP), Bremen, Germany)
Nicholas Geneva (NVIDIA)
Pierre Gentine (Columbia University)
Stephan Mandt (University of California, Irvine)
Jaideep Pathak (NVIDIA Corporation)
Akshay Subramaniam (NVIDIA)
Akshay Subramaniam

I am a Senior AI Developer Technology Engineer at NVIDIA. I got my PhD in Aeronautics from Stanford in 2019 working on topics related to turbulent flows and fluid mechanics, large eddy simulations, numerical methods and physics informed ML. At NVIDIA, I have been working on topics involving the convergence of physics and deep learning as well as data compression, recommender systems and automatic speech recognition.

Carl Vondrick (Columbia University)
Rose Yu (UC San Diego)
Laure Zanna (New York University)
Tian Zheng (Columbia University)
Tian Zheng

Tian Zheng is currently Professor and Department Chair of Statistics at Columbia University. In her research, she develops novel methods for exploring and understanding patterns in complex data from different application domains such as biology, psychology, climate modeling, etc. Her current projects are in the fields of statistical machine learning, spatiotemporal modeling, and social network analysis. Professor Zheng’s research has been recognized by the 2008 Outstanding Statistical Application Award from the American Statistical Association (ASA), the Mitchell Prize from ISBA, and a Google research award. She became a Fellow of the American Statistical Association in 2014 and a Fellow of the Institute of Mathematical Statistics in 2022. From 2017-2020, she was associate director for education at Columbia Data Science Institute. Professor Zheng received the 2017 Columbia Presidential Award for Outstanding Teaching. In 2021, she was recognized with a Lenfest Distinguished Columbia Faculty Award that recognizes the excellence of faculty as teachers and mentors of both undergraduate and graduate students.

Ryan Abernathey (Columbia University)
Fiaz Ahmed
David Bader
Pierre Baldi (UC Irvine)
Elizabeth Barnes
Christopher Bretherton (Allen Institute for AI)
Peter Caldwell (Lawrence Livermore National Lab)
Wayne Chuang
Yilun Han
YU HUANG (Columbia University)
Fernando Iglesias-Suarez (DLR)
Sanket Jantre (Brookhaven National Laboratory)
Karthik Kashinath (NVIDIA)
Marat Khairoutdinov (SUNY at Stony Brook)
Thorsten Kurth (Nvidia)
Nicholas Lutsko
Po-Lun Ma
Griffin Mooers (UC Irvine)
J. David Neelin (University of California-Los Angeles)
David Randall (Colorado State University)
Sara Shamekh
Mark Taylor (Sandia National Labs)
Nathan Urban (Brookhaven National Laboratory)
Janni Yuval
Guang Zhang (University of California-San Diego Scripps Inst of Oceanography)
Mike Pritchard (University of California, Irvine)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors