Timezone: »

MoleculeCLIP: Learning Transferable Molecule Multi-Modality Models via Natural Language
Shengchao Liu · Weili Nie · Chengpeng Wang · Jiarui Lu · Zhuoran Qiao · Ling Liu · Jian Tang · Anima Anandkumar · Chaowei Xiao

Recently, artificial intelligence for drug discovery has attracted an increasing interest in the community. One of the key challenges is to learn a powerful molecule representation. To achieve this goal, existing works focus on learning the molecule representations from the molecule chemical structures (\textit{i.e.}, 1D description, 2D topology, or 3D geometry). However, such representations poorly generalize to unseen tasks. Meanwhile, humans can learn the hierarchical and multi-modality information including molecule chemical structure and natural language (\textit{e.g.}, biomedical text) simultaneously and can generalize to new concepts. Motivated by this observation, in this paper, we explore the functionality of text utilization for drug discovery. We design a multi-modality model, MoleculeCLIP, by leveraging natural language and molecule structure. MoleculeCLIP consists of two branches: chemical structure branch to encode the chemical structures and textual description branch to encode corresponding natural language-based descriptions. To train it, we first collect a large-scale dataset with more than 280k text and molecule pairs, called PubChemCLIP. It is about 28$\times$ larger than the existing dataset. We then train our model on this dataset by using the contrastive learning strategy to bridge representations from the two branches. We carefully design two categories of zero-shot downstream tasks: the retrieval task and language-guided editing task, through which we highlight three key features of introducing language in MoleculeCLIP: the open vocabulary, the compositionality, and the domain knowledge exploration. By conducting extensive experiments, quantitatively, MoleculeCLIP outperforms the existing methods on 6 zero-shot retrieval tasks and 24 zero-shot language-guided molecule editing tasks. Qualitatively, we show that MoleculeCLIP can understand the domain information by successfully detecting the key structures referred in the text prompts. Furthermore, the representation learned from MoleculeCLIP can be used to further boost the performance of the existing task, molecular property prediction.

Author Information

Zhuoran Qiao (Caltech)

I'm a Ph.D. student at Miller Group, Caltech CCE. I'm working on developing deep learning methods with prior physical information for studying challenging problems in molecular electronic structures and dynamics.

Chaowei Xiao (ASU/NVIDIA)

I am Chaowei Xiao, a third year PhD student in CSE Department, University of Michigan, Ann Arbor. My advisor is Professor Mingyan Liu . I obtained my bachelor's degree in School of Software from Tsinghua University in 2015, advised by Professor Yunhao Liu, Professor Zheng Yang and Dr. Lei Yang. I was also a visiting student at UC Berkeley in 2018, advised by Professor Dawn Song and Professor Bo Li. My research interest includes adversarial machine learning.