Skip to yearly menu bar Skip to main content


Poster
in
Workshop: AI for Accelerated Materials Design (AI4Mat-2023)

Accurate Prediction of Experimental Band Gaps from Large Language Model-Based Data Extraction

Samuel Yang · Shutong Li · Subhashini Venugopalan · Vahe Tshitoyan · Muratahan Aykol · Amil Merchant · Ekin Dogus Cubuk · Gowoon Cheon

Keywords: [ Data Mining ] [ experimental band gap ] [ Large language models ]


Abstract:

Machine learning is transforming materials discovery by providing rapid predictions of material properties, which enables large-scale screening for target materials. However, such models require training data. While automated data extraction from scientific literature has potential, current auto-generated datasets often lack sufficient accuracy and critical structural and processing details that influence the properties. Using band gap as an example, we demonstrate LLM-prompt-based extraction yields an order of magnitude lower error rate. Combined with additional prompts to select a subset of experimentally measured properties from pure, single-crystalline bulk materials, this results in an automatically extracted dataset that's larger and more diverse than the largest existing human-curated database of experimental band gaps. Finally, compared to the existing human-curated database, we show the model trained on our extracted database achieves a 15\% reduction in the mean absolute error of predicted band gaps.

Chat is not available.