Timezone: »

Expo Demonstration
Real-time Navigation of Chemical Space with Cloud-Based Inference from MoLFormer
Payel Das · Brian Belgodere

Mon Nov 28 08:00 AM -- 10:00 AM & Mon Nov 28 12:00 PM -- 02:00 PM (PST) @ Ballroom C

We present a large chemical language model MoLFormer, which is an efficient transformer encoder model of chemical SMILES and uses rotary positional embeddings. This model employs a linear attention mechanism, coupled with highly distributed self-supervised training, on SMILES sequences of 1.1 billion molecules from the PubChem and ZINC datasets. Experiments show the generality of this molecular representation based on its performance on several molecular property classification and regression tasks. Further analyses, specifically through the lens of attention, demonstrate emergence of spatial relationships between atoms within MoLFormer trained on chemical SMILES. We further present a cloud-based real-time platform that allows users to virtually navigate the chemical space and screen molecules of interest. The platform leverages molecular embedding inferred from MoLFormer and retrieves nearest neighbors and their metadata for an input chemical. Based on the functionalities of this platform and results obtained, we believe that such a platform adds value in automating chemistry and assists in drug discovery and material design tasks.

Author Information

Payel Das (IBM Research)
Brian Belgodere (IBM Research)

More from the Same Authors