Skip to yearly menu bar Skip to main content


ProteinShake: Building datasets and benchmarks for deep learning on protein structures

Tim Kucera · Carlos Oliver · Dexiong Chen · Karsten Borgwardt

Great Hall & Hall B1+B2 (level 1) #105
[ ] [ Project Page ]
[ Paper [ Poster [ OpenReview
Thu 14 Dec 3 p.m. PST — 5 p.m. PST


We present ProteinShake, a Python software package that simplifies datasetcreation and model evaluation for deep learning on protein structures. Users cancreate custom datasets or load an extensive set of pre-processed datasets fromthe Protein Data Bank (PDB) and AlphaFoldDB. Each dataset is associated withprediction tasks and evaluation functions covering a broad array of biologicalchallenges. A benchmark on these tasks shows that pre-training almost alwaysimproves performance, the optimal data modality (graphs, voxel grids, or pointclouds) is task-dependent, and models struggle to generalize to new structures.ProteinShake makes protein structure data easily accessible and comparisonamong models straightforward, providing challenging benchmark settings withreal-world implications.ProteinShake is available at:

Chat is not available.