Skip to yearly menu bar Skip to main content

Workshop: Machine Learning in Structural Biology

A kernel for continuously relaxed, discrete Bayesian optimization of protein sequences

Yevgen Zainchkovskyy · Simon Bartels · Søren Hauberg · Jes Frellsen · Wouter Boomsma


Protein sequences follow a discrete alphabet rendering gradient-based techniques a poor choice for optimization-driven protein design. Contemporary approaches instead perform optimization in a continuous latent representation, but unfortunately the representation metric is generally a poor measure similarity between the represented proteins. This make (global) Bayesian optimization over such latent representations inefficient as commonly applied covariance functions are strongly dependent on the representation metric. Here we argue in favor of using the Jensen-Shannon divergence between the represented protein sequences to define a covariance function over the latent representation. Our exploratory experiments indicate that this kernel is worth further investigation.

Chat is not available.