Skip to yearly menu bar Skip to main content

Workshop: AI for Science: from Theory to Practice

What a Scientific Language Model Knows and Doesn't Know about Chemistry

Lawrence Zhao · Carl Edwards · Heng Ji


Large Language Models (LLMs) show promise to change how we can interact with and control the design of other modalities, such as drugs, materials, and proteins, and enable scientific reasoning and planning. However, LLMs have several weaknesses: they tend to memorize instead of understand, and the implicit knowledge does not always propagate well between semantically similar inputs. In this work, we seek to distinguish what these scientific LLMs have memorized versus what they actually understand. To do so, we propose a new comprehensive benchmark dataset to evaluate LLM performance on molecular property prediction. We consider Galactica 1.3B, a state-of-the-art scientific LLM, and find that different prompting strategies exhibit vastly different error rates. We find that in-contextlearning generally improves performance over zero-shot prompting, and the effect is twice as great for computed properties than for experimental. Furthermore, we show the model is brittle and relies on memorized information, which may limit the application of LLMs for controlling molecular discovery. Based on these findings, we suggest the development of novel methods to enhance information propagation within LLMs—if we desire LLMs to help us control molecular design and the scientific process, then they must learn a sufficient understanding of how molecules work in the real world.

Chat is not available.