Expo Demonstration
La Nouvelle Orleans Ballroom C (level 2)

We present a large chemical language model MoLFormer, which is an efficient transformer encoder model of chemical SMILES and uses rotary positional embeddings. This model employs a linear attention mechanism, coupled with highly distributed self-supervised training, on SMILES sequences of 1.1 billion molecules from the PubChem and ZINC datasets. Experiments show the generality of this molecular representation based on its performance on several molecular property classification and regression tasks. Further analyses, specifically through the lens of attention, demonstrate emergence of spatial relationships between atoms within MoLFormer trained on chemical SMILES. We further present a cloud-based real-time platform that allows users to virtually navigate the chemical space and screen molecules of interest. The platform leverages molecular embedding inferred from MoLFormer and retrieves nearest neighbors and their metadata for an input chemical. Based on the functionalities of this platform and results obtained, we believe that such a platform adds value in automating chemistry and assists in drug discovery and material design tasks.