Workshop: AI for Accelerated Materials Design (AI4Mat)

Differential top-k learning for template-based single-step retrosynthesis

Andres M Bran · Philippe Schwaller

Keywords: [ loss function ] [ template based ] [ differential top-k ] [ Retrosynthesis ]


Retrosynthesis is one of the core tasks in the organic molecule design cycle, yet it is still a computational challenge to produce suitable sets of precursors for a desired product. Commonly used template-based approaches reduce the problem to a multi-class classification task for single steps. However, reactions in available datasets are noisy and incomplete, making usual training methods problematic. In this work, considering that multiple disconnections are possible for a product, we propose training models using differential top-k losses. We show that using these loss functions yields improvements in every top-N metric, with little overhead relative to cross-entropy. The use of more powerful models, more diverse and complete datasets, and other methodologies, is expected to yield significant improvements on this task when combined with the training approach presented here.

Chat is not available.