Designing Biological Sequences via Meta-Reinforcement Learning and Bayesian Optimization
Abstract
Designing functionally interesting biological sequences pose challenges due to the combinatorially large space of the problem. As such, the acceleration of exploration through this landscape can have a substantial impact on the progress of the medical field. Motivated by this, we propose MetaRLBO where we (1) train an autoregressive generative model via Meta-Reinforcement Learning augmented with surrogate reward functions and exploration bonus to navigate through the sequence space efficiently. The Meta-RL policy is trained over a distribution of beliefs (i.e., proxy oracles) of the objective function, encouraging the policy to generate diverse sequences. Due to the large-batch and low-round nature of the wet-lab evaluations (true function evaluation), we (2) perform a more targeted evaluation through Bayesian Optimization. Our in-silico experiments show that meta-learning over such ensembles provides robustness against reward misspecification and achieves competitive results compared to existing strong baselines.