Skip to yearly menu bar Skip to main content

Workshop: Machine Learning in Structural Biology Workshop

FLAb: Benchmarking deep learning methods for antibody fitness prediction

Michael F Chungyoun · Jeffrey Ruffolo · Jeffrey Gray


The successful application of machine learning in therapeutic antibody design relies heavily on the ability of models to accurately represent the sequence-structure-function landscape, also known as the fitness landscape. Previous protein benchmarks (including The Critical Assessment of Function Annotation, Tasks Assessing Protein Embeddings, and FLIP) examine fitness and mutational landscapes across many protein families, but they either exclude antibody data or use very little of it. In light of this, we present the Fitness Landscape for Antibodies (FLAb), the largest therapeutic antibody design benchmark to date. FLAb currently encompasses six properties of therapeutic antibodies: (1) expression, (2) thermostability, (3) immunogenicity, (4) aggregation, (5) polyreactivity, and (6) binding affinity. We use FLAb to assess the performance of various widely adopted, pretrained, deep learning models for proteins (IgLM, AntiBERTy, ProtGPT2, ProGen2, ProteinMPNN, and ESM-IF); and compare them to physics-based Rosetta. Overall, no models are able to correlate with all properties or across multiple datasets of similar properties, indicating that more work is needed in prediction of antibody fitness. Additionally, we elucidate how wild type origin, deep learning architecture, training data composition, parameter size, and evolutionary signal affect performance, and we identify which fitness landscapes are more readily captured by each protein model. To promote an expansion on therapeutic antibody design benchmarking, all FLAb data are freely accessible and open for additional contribution at

Chat is not available.