Skip to yearly menu bar Skip to main content

Workshop: AI for Science: from Theory to Practice

ChatPathway: Conversational Large Language Models for Biology Pathway Detection

Yanjing Li · Hannan Xu · Haiteng Zhao · Hongyu Guo · Shengchao Liu


Biological pathways, like protein-protein interactions and metabolic networks, are vital for understanding diseases and drug development. Some databases such as KEGG are designed to store and map these pathways. However, many bioinformatics methods face limitations due to database constraints, and certain deep learning models struggle with the complexities of biochemical reactions involving large molecules and diverse enzymes. Importantly, the thorough exploration of biological pathways demands a deep understanding of scientific literature and past research. Despite this, recent advancements in Large Language Models (LLMs), especially ChatGPT, show promise. We first restructured data from KEGG and augmented it with molecule structural and functional information sourced from UniProt and PubChem. Our study evaluated LLMs, particularly GPT-3.5-turbo and Galactica, in predicting biochemical reactions and pathways using our constructed data. We also assessed its ability to predict novel pathways, not covered in its training dataset, using findings from recently published studies. While GPT demonstrated strengths in pathway mapping, Galactica encountered challenges. This research emphasizes the potential of merging LLMs with biology, suggesting a harmonious blend of human expertise and AI in decoding biological systems.

Chat is not available.