Timezone: »
We present a large chemical language model MoLFormer, which is an efficient transformer encoder model of chemical SMILES and uses rotary positional embeddings. This model employs a linear attention mechanism, coupled with highly distributed self-supervised training, on SMILES sequences of 1.1 billion molecules from the PubChem and ZINC datasets. Experiments show the generality of this molecular representation based on its performance on several molecular property classification and regression tasks. Further analyses, specifically through the lens of attention, demonstrate emergence of spatial relationships between atoms within MoLFormer trained on chemical SMILES. We further present a cloud-based real-time platform that allows users to virtually navigate the chemical space and screen molecules of interest. The platform leverages molecular embedding inferred from MoLFormer and retrieves nearest neighbors and their metadata for an input chemical. Based on the functionalities of this platform and results obtained, we believe that such a platform adds value in automating chemistry and assists in drug discovery and material design tasks.
Author Information
Payel Das (IBM Research)
Brian Belgodere (IBM Research)
More from the Same Authors
-
2021 : Accurate Multi-Endpoint Molecular Toxicity Predictions in Humans with Contrastive Explanations »
Bhanushee Sharma · Vijil Chenthamarakshan · Amit Dhurandhar · James Hendler · Jonathan S. Dordick · Payel Das -
2021 : Sample-Efficient Generation of Novel Photo-acid Generator Molecules using a Deep Generative Model »
Samuel Hoffman · Vijil Chenthamarakshan · Dmitry Zubarev · Daniel Sanders · Payel Das -
2021 : Grapher: Multi-Stage Knowledge Graph Construction using Pretrained Language Models »
Igor Melnyk · Pierre Dognin · Payel Das -
2022 : Reducing Down(stream)time: Pretraining Molecular GNNs using Heterogeneous AI Accelerators »
Jenna A Bilbrey · Kristina Herman · Henry Sprueill · Sotiris Xantheas · Payel Das · Manuel Lopez Roldan · Mike Kraus · Hatem Helal · Sutanay Choudhury -
2022 : Consistent Training via Energy-Based GFlowNets for Modeling Discrete Joint Distributions »
Chanakya Ekbote · Moksh Jain · Payel Das · Yoshua Bengio -
2022 : Panel »
Pin-Yu Chen · Alex Gittens · Bo Li · Celia Cintas · Hilde Kuehne · Payel Das -
2022 : SynBench: Task-Agnostic Benchmarking of Pretrained Representations using Synthetic Data »
Ching-Yun Ko · Pin-Yu Chen · Jeet Mohapatra · Payel Das · Luca Daniel -
2021 : Grapher: Multi-Stage Knowledge Graph Construction using Pretrained Language Models »
Igor Melnyk · Pierre Dognin · Payel Das -
2021 : Sample-Efficient Generation of Novel Photo-acid Generator Molecules using a Deep Generative Model »
Samuel Hoffman · Vijil Chenthamarakshan · Dmitry Zubarev · Daniel Sanders · Payel Das -
2021 Poster: Predicting Deep Neural Network Generalization with Perturbation Response Curves »
Yair Schiff · Brian Quanz · Payel Das · Pin-Yu Chen -
2021 Poster: Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination »
Arpan Mukherjee · Ali Tajer · Pin-Yu Chen · Payel Das -
2020 : Spotlight: Characterizing the Latent Space of Molecular Generative Models with Persistent Homology Metrics »
Yair Schiff · Payel Das · Vijil Chenthamarakshan · Karthikeyan Natesan Ramamurthy -
2020 Poster: A Decentralized Parallel Algorithm for Training Generative Adversarial Nets »
Mingrui Liu · Wei Zhang · Youssef Mroueh · Xiaodong Cui · Jarret Ross · Tianbao Yang · Payel Das -
2020 : Spotlight on women at IBM Research »
Lisa Amini · Francesca Rossi · Celia Cintas · Payel Das -
2020 Poster: CogMol: Target-Specific and Selective Drug Design for COVID-19 Using Deep Generative Models »
Vijil Chenthamarakshan · Payel Das · Samuel Hoffman · Hendrik Strobelt · Inkit Padhi · Kar Wai Lim · Benjamin Hoover · Matteo Manica · Jannis Born · Teodoro Laino · Aleksandra Mojsilovic -
2020 : CogMol: Target-Specific and Selective Drug Design for COVID-19 Using Deep Generative Models »
Payel Das -
2020 Poster: Optimizing Mode Connectivity via Neuron Alignment »
Norman J Tatro · Pin-Yu Chen · Payel Das · Igor Melnyk · Prasanna Sattigeri · Rongjie Lai -
2020 Expo Talk Panel: AI against COVID-19 at IBM Research »
Divya Pathak · Payel Das · Michal Rosen-Zvi · Salim Roukos -
2018 : Contributed Work »
Thaer Moustafa Dieb · Aditya Balu · Amir H. Khasahmadi · Viraj Shah · Boris Knyazev · Payel Das · Garrett Goh · Georgy Derevyanko · Gianni De Fabritiis · Reiko Hagawa · John Ingraham · David Belanger · Jialin Song · Kim Nicoli · Miha Skalic · Michelle Wu · Niklas Gebauer · Peter Bjørn Jørgensen · Ryan-Rhys Griffiths · Shengchao Liu · Sheshera Mysore · Hai Leong Chieu · Philippe Schwaller · Bart Olsthoorn · Bianca-Cristina Cristescu · Wei-Cheng Tseng · Seongok Ryu · Iddo Drori · Kevin Yang · Soumya Sanyal · Zois Boukouvalas · Rishi Bedi · Arindam Paul · Sambuddha Ghosal · Daniil Bash · Clyde Fare · Zekun Ren · Ali Oskooei · Minn Xuan Wong · Paul Sinz · Théophile Gaudin · Wengong Jin · Paul Leu -
2018 Poster: Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives »
Amit Dhurandhar · Pin-Yu Chen · Ronny Luss · Chun-Chen Tu · Paishun Ting · Karthikeyan Shanmugam · Payel Das