PolUQBench: A Benchmark Study on Uncertainty Quantification of Polymer Property Prediction
Abstract
Large Language Model(LLM)s have demonstrated remarkable capabilities to tackle multidomain challenges, a capability often lacking in conventional machine learning methods. This makes them particularly promising for understanding the complex relationship between a material's composition and its properties, which can significantly accelerate materials design, especially for polymers. Leveraging the hidden states of domain-specific pretrained LLMs for downstream tasks like property prediction has gained significant traction. This approach is now widely used for small molecules and proteins, along with recent efforts also extending to polymers. In addition to achieving superior predictive performance, Uncertainty Quantification (UQ) is another crucial aspect for enhancing the reliability of machine learning models used as property predictors. This is particularly important for high-stakes applications like the discovery of new functional polymers. We introduce Polymer Property Predictor Uncertainty Quantification Benchmark, a pioneering study that evaluates the effectiveness of embeddings extracted from a Polymer Language Model for representing polymer data and assesses the performance of several different UQ methods for reliable polymer property prediction.