Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Generative AI and Biology (GenBio@NeurIPS2023)

Fine-tuning protein Language Models by ranking protein fitness

Minji Lee · Kyungmin Lee · Jinwoo Shin

Keywords: [ protein language model ] [ Ranking-based fine tuning ] [ Fitness prediction ]


Abstract:

The self-supervised protein language models (pLMs) have demonstrated significant potential in predicting the impact of mutations on protein function and fitness, which is crucial for therapeutic design. However, the zero-shot pLMs often exhibit a weak correlation to fitness and thus struggle to generate fit variants. To address this challenge, we propose a fine-tuning framework for pLMs by ranking the fitness data. We show that constructing the ranked pairs is crucial in fine-tuning pLMs, where we provide a simple yet effective method to improve fitness prediction across various datasets. Through experiments on ProteinGym, our method shows substantial improvements in the fitness prediction tasks even using less than 200 labeled data. Furthermore, we demonstrate that our approach excels in fitness optimization tasks.

Chat is not available.