Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Generative AI and Biology (GenBio@NeurIPS2023)

Conditional Generation of Antigen Specific T-cell Receptor Sequences

Dhuvarakesh Karthikeyan · Colin Raffel · Benjamin Vincent · Alex Rubinsteyn

Keywords: [ Large language models ] [ Immunology ] [ Many to Many ] [ Seq2Seq ]


Abstract:

Training and evaluation of large language models (LLMs) for use in designing antigen specific T-cell receptor (TCR) sequences is challenging due to the complex many-to-many mapping between TCRs and their targets, which is exacerbated by a severe lack of ground truth data. Traditional NLP metrics can be artificially poor indicators of model performance since labels are concentrated on a few examples, and functional in-vitro assessment of generated TCRs is time-consuming and costly. Here, we introduce TCR-BART and TCR-T5, adapted from the prominent BART and T5 models, to explore the use of these LLMs for conditional TCR sequence generation given a specific epitope of interest. To fairly evaluate such models with limited labeled examples, we propose novel evaluation metrics tailored to the sparsely sampled many-to-many nature of TCR-epitope data and investigate the interplay between accuracy and diversity of generated TCR sequences.

Chat is not available.