AI for Science: The Reach and Limits of AI for Scientific Discovery
Abstract
Through our proposed AI for Science workshop, we will bring together experimentalists, domain scientists, and ML researchers to discuss the reach and limits of AI for scientific discovery. We will center our discussion on three challenges that are essential to progress across scientific domains: LLM reasoning across scientific domains– can present-day LLMs generate rigorously testable hypotheses and reason over experimental results that span scientific domains such as physics, chemistry, and biology? Fidelity of generative and surrogate simulators– In biology, we see a shift towards all-atom models with increasingly powerful capabilities, in chemistry machine learning force fields are increasing in accuracy and generalizability, and in climate modeling we can now accurately predict weather 15 days out. How far can we push this limit? What spatial or temporal scales remain intractable? Experimental data scarcity and bias. We see modern examples of large-scale dataset generation such as the Protein Data Bank, Human Cell Atlas, and the Materials Project. Are there other fields where AI can benefit most from consortium efforts to generate large-scale datasets? How far can models trained on limited experimental datasets take us and where are lab-in-the-loop strategies essential? To address this, we additionally introduce a dataset proposal competition. Our workshop will highlight common bottlenecks in developing AI methods across scientific application domains, and delve into solutions that can unlock progress across all of these domains.